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Abstract:  The  linear  programming  (LP)  is  one  of  the  most  popular  necessary  optimization  tool  used 
for  data  analytics  as  well  as  in  various  scientific  fields.  However,  the  current  state-of-art  algorithms 
suffer  from  scalability  issues  when  processing  Big  Data.  For  example,  the  commercial  optimization 
software  IBM  CPLEX  cannot  handle  an  LP  with  more  than  hundreds  of  thousands  variables  or 
constraints.  Existing  algorithms  are  fundamentally  hard  to  scale  because  they  are  inevitably  too 
complex  to  parallelize.  To  address  the  issue,  we  study  the  possibility  of  using  the  Belief  Propagation 
(BP)  algorithm  as  an  LP  solver.  BP  has  shown  remarkable  performances  on  various  machine  learning 
tasks  and  it  naturally  lends  itself  to  fast  parallel  implementations.  Despite  this,  very  little  work  has 
been  done  in  this  area.  In  particular,  while  it  is  generally  believed  that  BP  implicitly  solves  an 
optimization  problem,  it  is  not  well  understood  under  what  conditions  the  solution  to  a  BP  converges 
to  that  of  a  corresponding  LP  formulation. 

Our  efforts  consist  of  two  main  parts.  First,  we  perform  a  theoretic  study  and  establish  the  conditions 
in  which  BP  can  solve  LP  [1,2].  Although  there  has  been  several  works  studying  the  relation  between 
BP  and  LP  for  certain  instances,  our  work  provides  a  generic  condition  unifying  all  prior  works  for 
generic  LP.  Second,  utilizing  our  theoretical  results,  we  develop  a  practical  BP-based  parallel 
algorithms  for  solving  generic  LPs,  and  it  shows  71x  speed  up  while  sacrificing  only  0.1%  accuracy 
compared  to  the  state-of-art  exact  algorithm  [3,4]. 

As  a  result  of  the  study,  the  Pis  have  published  two  conference  papers  [1,3]  and  two  follow-up  journal 
papers  [3,4]  are  under  submission.  We  refer  the  readers  to  our  published  work  [1,3]  for  details. 


Introduction:  The  main  goal  of  our  research  is  to  develop  a  distributed  and  parallel  algorithm  for 
large-scale  linear  optimization  (or  programming).  Considering  the  popularity  and  importance  of  linear 
optimizations  in  various  fields,  the  proposed  method  has  great  potentials  applicable  to  various  big  data 
analytics.  Our  approach  is  based  on  the  Belief  Propagation  (BP)  algorithm,  which  has  shown 
remarkable  performances  on  various  machine  learning  tasks  and  naturally  lends  itself  to  fast  parallel 
implementations. 

Our  key  contributions  are  summarized  below: 

1)  We  establish  key  theoretic  foundations  in  the  area  of  Belief  Propagation.  In  particular,  we 
show  that  BP  converges  to  the  solution  of  LP  if  some  sufficient  conditions  are  satisfied.  Our 
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conditions  not  only  cover  various  prior  studies  including  maximum  weight  matching,  min- 
cost  network  flow,  shortest  path,  etc.,  but  also  discover  new  applications  such  as  vertex  cover 
and  traveling  salesman. 

2)  While  the  theoretic  study  provides  understanding  of  the  nature  of  BP,  it  falls  short  in  slow 
convergence  speed,  oscillation  and  wrong  convergence.  To  make  BP-based  algorithms  more 
practical,  we  design  a  BP-based  framework  which  uses  BP  as  a  ‘weight  transformer’  to 
resolve  the  convergence  issue  of  BP. 

We  refer  the  readers  to  our  published  work  [1,  3]  for  details.  The  rest  of  the  report  contains  a 
summary  of  our  work  appeared  in  UAI  (Uncertainty  in  Artificial  Intelligence)  and  IEEE  Conference 
in  Big  Data  [1,3]  and  follow  up  work  [2,4]  under  submission  to  major  journals. 


Experiment:  We  first  establish  theoretical  conditions  when  Belief  Propagation  (BP)  can  solve  Linear 
Programming  (LP),  and  second  provide  a  practical  distributed/parallel  BP-based  framework  solving 
generic  optimizations.  We  demonstrate  the  wide-applicability  of  our  approach  via  popular 
combinatorial  optimizations  including  maximum  weight  matching,  shortest  path,  traveling  salesman, 
cycle  packing  and  vertex  cover. 


Results  and  Discussion:  Our  contribution  consists  of  two  parts:  Study  1  [1,2]  looks  at  the  theoretical 
conditions  that  BP  converges  to  the  solution  of  LP.  Our  theoretical  result  unify  almost  all  prior  result 
about  BP  for  combinatorial  optimization.  Furthermore,  our  conditions  provide  a  guideline  for 
designing  distributed  algorithm  for  combinatorial  optimization  problems.  Study  2  [3,4]  focuses  on 
building  an  optimal  framework  based  on  the  theory  of  Study  1  for  boosting  the  practical  performance 
of  BP.  Our  framework  is  generic,  thus,  it  can  be  easily  extended  to  various  optimization  problems.  We 
also  compare  the  empirical  performance  of  our  framework  to  other  heuristics  and  state  of  the  art 
algorithms  for  several  combinatorial  optimization  problems. 

- Study  1  - 

We  first  introduce  the  background  for  our  contributions.  A  joint  distribution  of  □  (binary)  variables 
□  =  [□□]  G  {0,7}l  is  called  graphical  model  (GM)  if  it  factorizes  as  follows:  for  □  =  [flip]  E  {0,1}U, 

Pi \Z  =  z]  OC  JJ  1pi(Zi)  JJ  i>a(za), 

a  €F 

where  x/jn  ,  □  gj  are  some  non-negative  functions  so  called  factors;  □  is  a  collection  of  subsets 

F  =  {ai,a2,...,ak}  C  2*1’2 . 

(each  au  is  a  subset  of  { 1 ,  •••/!□}  with  |  Qq|  >  2;  Qn  is  the  projection  of  □  onto  dimensions  included 
in  a.  Assignment  □*  is  called  maximum-a-posteriori  (MAP)  assignment  if  □'‘maximizes  the 
probability. 

The  following  figure  depicts  the  graphical  relation  between  factors  □  and  variables  □ . 
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Figure  1:  Factor  graph  for  the  graphical  model  with  factors  =  {1,3},  □2  =  {1,2,4},  U3  =  {2,3,4} 


Now  we  introduce  the  algorithm,  (max-product)  BP,  for  approximating  MAP  assignment  in  a 
graphical  model.  BP  is  an  iterative  procedure;  at  each  iteration  □,  there  are  four  messages 

{m^(c),m|4a(c) :  cg  {0,1}} 


between  each  variable  □□  and  every  associated  a  E  Op,  where  □□:  =  {□  efSO  ED}.  Then, 
messages  are  updated  as  follows: 


max 

£«  ■  — C 

)  n  m^n(2j) 

(1) 

LaV)  - 

A(c)  F[ 

(2) 

a*EFi\a 

Finally,  given  messages,  BP  marginal  beliefs  are  computed  as  follows: 

bi[Zi]  =  FI  ™>cx-n(Zi)-  (3) 

ctEFj 

Then,  BP  outputs  the  approximated  MAP  assignment  a:on  =  [  f  ]  as 


{  1  if  b,[l]  >  b, [0] 

<  ?  if  6i[l]  =  k[0]  . 
[O  if  &i[l]  <  MO] 


Now,  we  are  ready  to  introduce  the  main  result  of  Study  1.  Consider  the  following  GM:  for  □  = 
[Qfa]  G  {0,1}°  and  □  =  [□□]  e  Dn, 

Pv[X  =  x]  oc  FI  e~WiXi  n  Mxa),  (4) 

i  a€F 

where  the  factor  function  ipa  for  a  E  D  is  defined  as 


1pa{%a) 


1  if  AaXct  ^  ba,  CqXq  —  da 

0  otherwise 


for  some  matrices  Qtg  and  vectors  Dq,  Dd.  Consider  the  Linear  Programming  (LP)  corresponding 
the  above  GM: 


minimize  w  •  x 

subject  to  tbaiXa)  =  1,  Vet  e  F  (5) 

x  =  [Xi]  €  [0,  If. 

One  can  easily  observe  that  the  MAP  assignments  for  GM  corresponds  to  the  (optimal)  solution  of  the 
above  LP  if  the  LP  has  an  integral  solution  □*  E  {0,1}U .  The  following  theorem  is  our  main  result  of 
Study  1  which  provide  sufficient  conditions  so  that  BP  can  indeed  find  the  LP  solution 
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Theorem  1  The  max-product  BP  on  GM  (4)  with  arbitrary  initial  message  converges 
to  the  solution  of  LP  (5)  if  the  following  conditions  hold: 


CL  LP  (5)  has  a  unique  integral  solution  x*  £  {0,  I  }?\  i.e.,  it  is  tight. 

C2 .  For  every  i  £  {1,2,...,  n},  the  number  of  factors  associated  with  xz  is  at  most 
two,  i.e.,  |jpj|  <  2. 

C3.  For  every  factor  every  xQ  £  {0,  l}lal  with  ^Q(xQ)  =  1,  and  every  i  £  a  with 
Xi  7^  x*,  there  exists  7  C  a  such  that 


\{j  €  {i}  U7  :  |Fj|  =  2}|  <  2 


Theorem  1  can  be  applied  to  several  combinatorial  optimization  problems  including  matching, 
network  flow,  shortest  path,  vertex  cover,  etc.  See  [1,2]  for  the  detailed  proof  of  Theorem  1  and  its 
applications  to  various  combinatorial  optimizations  including  maximum  weight  matching,  min-cost 
network  flow,  shortest  path,  vertex  cover  and  traveling  salesman. 


Study  2 


Study  2  mainly  focuses  on  providing  a  distributed  generic  BP-based  combinatorial  optimization 
solver  which  has  high  accuracy  and  low  computational  complexity.  In  summary,  the  key  contributions 
of  Study  2  are  as  follows: 

1)  Practical  BP-based  algorithm  design:  To  the  best  of  our  knowledge,  this  paper  is  the  first  to 
propose  a  generic  concept  for  designing  BP-based  algorithms  that  solve  large-scale 
combinatorial  optimization  problems. 

2)  Parallel  implementation:  We  also  demonstrate  that  the  algorithm  is  easily  parallelizable.  For 
the  maximum  weighted  matching  problem,  this  translates  to  7 lx  speed  up  while  sacrificing 
only  0.1%  accuracy  compared  to  the  state-of-art  exact  algorithm. 

3)  Extensive  empirical  evaluation:  We  evaluate  our  algorithms  on  three  different  combinatorial 
optimization  problems  on  diverse  synthetic  and  real-world  data-sets.  Our  evaluation  shows 
that  the  framework  shows  higher  accuracy  compared  to  other  known  heuristics. 

Designing  a  BP-based  algorithm  for  some  problem  is  easy  in  general.  However  (a)  it  might  diverge  or 
converge  very  slowly,  (b)  even  if  it  converges  quickly,  the  BP  decision  might  be  not  correct,  and  (c) 
even  worse,  BP  might  produce  an  infeasible  solution,  i.e.,  it  does  not  satisfy  the  constraints  of  the 
problem. 
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Input 


Output 


(1)  BP  Weight  Transforming 

Figure  2:  Overview  of  our  generic  BP-based  framework 


{2}  Post- Processing 


To  address  these  issues,  we  propose  a  generic  BP-based  framework  that  provides  highly  accurate 
approximate  solutions  for  combinatorial  optimization  problems.  The  framework  has  two  steps,  as 
shown  in  Figure  2.  In  the  first  phase,  it  runs  a  BP  algorithm  for  a  fixed  number  of  iterations  without 
waiting  for  convergence.  Then,  the  second  phase  runs  a  known  heuristic  using  BP  beliefs  instead  of 
the  original  weights  to  output  a  feasible  solution.  Namely,  the  first  and  second  phases  are  respectively 
designed  for  ‘BP  weight  transforming’  and  ‘post-processing’.  Note  that  our  evaluation  mainly  uses 
the  maximum  weight  matching  problem.  The  formal  description  of  the  maximum  weight  matching 
(MWM)  problem  is  as  follows:  Given  a  graph  □  =  (□,  □)  and  edge  weights  0  =  [Qp]  G  it 
finds  a  set  of  edges  such  that  each  vertex  is  connected  to  at  most  one  edge  in  the  set  and  the  sum  of 
edge  weights  in  the  set  is  maximized.  The  problem  is  formulated  as  the  following  IP  (Integer 
Programming): 

maximize  w  •  x 

s.t.  yt  xe  <  1,  Vv  €  V,  x  =  [xe]  e  {(),  l}|£|, 

e€<5(t>) 

where  5(D)  is  the  set  of  edges  incident  to  vertex  □  E  □.  In  the  following  paragraphs,  we  describe  the 
two  phases  in  more  detail  in  reverse  order. 


We  first  describe  the  post-processing  phase.  As  we  mentioned,  one  of  the  main  issue  of  a  BP-based 
algorithm  is  that  the  decision  on  BP  beliefs  might  give  an  infeasible  solution.  To  resolve  the  issue,  we 
use  post-processing  by  utilizing  existing  heuristics  to  the  given  problem  that  find  a  feasible  solution. 
Applying  post-processing  ensures  that  the  solution  is  at  least  feasible.  In  addition,  our  key  idea  is  to 
replace  the  original  weights  by  the  logarithm  of  BP  beliefs,  i.e.  function  of  (3).  After  this,  we  apply 
known  heuristics  using  the  logarithm  of  BP  beliefs  to  achieve  higher  accuracy. 

To  confirm  the  effectiveness  of  the  proposed  post-processing  mechanism,  we  compare  it  with  the 
following  two  alternative  post-processing  schemes  for  the  maximum  weight  matching  problem  that 
remove  edges  to  enforce  matching  after  BP  processing  in  a  naive  manner: 


Random:  If  there  exists  a  vertex  □  such  that  more  than  one  neighboring  edges  are  selected  on 
the  BP  decision,  randomly  select  one  edge  and  remove  other  edges. 

Weight:  If  there  exists  a  vertex  □  such  that  more  than  one  neighboring  edges  are  selected  on 
the  BP  decision,  remove  edges  of  smaller  weight. 

Figure  3  compares  the  approximation  ratio  obtained  using  BP-belief-based  post-processing  versus  the 
naive  post-processing  heuristics  (random  and  weight).  It  shows  that  the  proposed  BP-belief-based 
post-processing  outperforms  the  rest.  Note,  the  results  in  Figure  3  were  obtained  by  first  applying  BP 
message  passing  for  weight  transformation.  Next,  we  explain  how  this  is  done  in  our  framework. 
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■  Random  E  Weight  L>  BP  Message 
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fa) 


Initial  Message  Value  /  Edge  Weight 
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Figure  3:  a)  Average  approximation  ratio  for  different  post-processing  schemes.  We  use  a  local 
greedy  algorithm  as  a  post-processing  based  on  original  weights  and  BP  messages  (i.e.,  beliefs).  The 
‘Random  selection’  post-processing  is  also  compared,  b)  Effects  of  initial  messages  on  the 


Q  □-»(□,□)  (Q)  


convergence  of  BP.  We  set  0  □  □  :  “  0  QQ  ^ 

nn-»(Qfl)  CO 

c)  Approximation  ratio  for  different  initial  messages  Owg  —  0,  Uuu/ 2,  □□q 


—  □□  □□  where  x-axis  represent  the  value  ofD. 


Now,  we  describe  the  BP  weight  transforming  phase.  To  improve  the  approximation  quality  and  solve 
the  convergence  issues,  we  use  three  modifications  to  the  standard  BP  algorithm:  (1)  careful 
initialization  on  messages,  (2)  noise  addition  and  (3)  hybrid  damping  on  message  updates. 


Message  Initialization.  The  standard  message  initialization  is  □  _>n°  =  0  =  1  for  the  maximum 

weight  matching  problem.  However,  the  convergence  rate  of  BP  depends  on  the  initialized  messages. 
As  reported  in  Figure  4,  we  try  different  initializations  by  varying  the  log  ratio  □□□:  = 

n  ^  ( o') 

□  □  □  ■  D^(D,D)Q  =  JMkq  for  0  <  □  <  7,  where  the  case  3  =  0.5  shows  the  fastest  convergence. 

fto-KW)  CO 

The  choice  □□□  =  0.50  alleviates  the  fluctuation  behavior  of  BP  and  boosts  up  its  convergence 
speed.  We  remind  that,  under  our  framework,  BP  runs  only  for  a  fixed  number  of  iterations  since  it 
might  converge  too  slowly,  even  with  the  initialization  Q'ga  =  0.5 0$#  for  practical  purposes.  With 
fixed  number  of  iterations,  careful  initialization  becomes  even  more  critical  as  experimental  results  in 
Figure  3(c)  and  Figure  4  suggest.  For  example,  if  one  runs  5000  iterations  of  BP,  they  show  that  the 
standard  initialization  achieves  at  most  30%  approximation  ratio,  while  the  proposed  method  achieves 
99%.  Moreover,  one  can  also  observe  that  the  advantage  of  more  BP  updates  diminishes  as  the 
number  of  iterations 


■  20  iterations  ■  100  iterations  □  5000  iterations 


Initial  Message  Value  /  Edge  Weight 

Figure  4:  Effects  of  initial  messages  on  the  number  of  BP  iterations.  We  set  Dnn  =  □  □||]  for  a  value 
c  of  x-axis. 


Noise  Addition.  The  BP  algorithm  often  oscillates  when  the  MAP  solution  is  not  unique.  To  address 
this  issue,  we  transform  the  original  problem  to  one  that  has  a  unique  solution  with  high  probability 
by  adding  small  noises  to  the  weights.  We  apply  this  to  all  cases.  Here,  one  has  to  be  careful  in 
deciding  the  range  □  of  noises.  If  □  is  too  large,  the  quality  of  BP  solution  deteriorates  because  the 
optimal  solution  might  have  changed  from  the  original  problem.  On  the  other  hand,  if  □  is  too  small 
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compared  to  we,  BP  converges  very  slowly.  To  achieve  a  balance,  we  choose  the  range  r  of  noise  re  as 
10%  of  the  minimum  distance  among  weights.  We  find  that  this  results  in  over  99.8%  approximation 
ratio  even  when  the  solution  is  not  unique,  which  has  little  difference  with  that  of  unique  solution  as 
shown  in  Table  I. 


#  vertices 
(#  edges) 

Approximation  Ratio 

Difference 

Multiple  optima 

Unique  optimum 

Ik  (50k) 

99.88  % 

99.90  % 

-0.02  % 

5k  (250k) 

99.86  % 

99.85  % 

+0.01  % 

10k  (500k) 

99.85  % 

99.84  % 

+0.01  % 

20k  (1 M) 

99.84  % 

99.83  % 

+0.01  % 

Table  I:  Approximation  ratio  of  BP  for  MWM  with  multiple  optima  and  a  unique  optimum.  We 
introduce  a  small  noise  to  the  edge  weights  and  set  the  initial  message  by  dtja  =  Don  12. 


Hybrid  Damping.  To  boost  up  the  convergence  speed  of  BP  updates  further,  we  use  a  specific 
damping  strategy  to  alleviate  message  oscillation.  We  update  messages  to  be  the  average  of  old  and 
new  messages  as  follows: 

4Xj  <“  .  max  { max  {  wik  -  4U»>  ()}  } 


(4 


-j +«&})/ 2. 


We  note  that  the  damping  strategy  provides  a  similar  effect  as  our  proposed  initialization  □□□  = 
□o  n/2.  Hence,  if  one  uses  both,  the  effect  of  one  might  be  degraded  due  to  the  other.  Due  to  this,  we 
first  run  the  half  of  BP  iterations  without  damping  (this  is  for  keeping  the  effect  of  the  proposed 
initialization)  and  perform  the  last  half  of  BP  iterations  with  damping.  As  reported  in  Table  II,  this 
hybrid  approach  outperforms  other  alternatives,  including  (a)  no  use  of  damping,  (b)  using  damping  in 
every  iteration,  and  (c)  damping  in  the  first  half  of  BP  iterations  and  no-damping  in  the  last  half. 


#  vertices 
(#  edges) 

Approximation  Ratio 

nO'dampf  100) 

damp  (100) 

no-damp(50) 

+damp(50) 

damp(50) 

+no-damp(50) 

10k  (500k) 

99.58  % 

99.69  % 

99.83  % 

99.56  % 

20k  (JJV1) 

99.55  % 

99.68  % 

99,82  % 

99.56  % 

50k  (2.5M) 

99.56  % 

99.69  % 

99,83  % 

99.57  % 

100k  (5M) 

99.56  % 

99.69  % 

99.83  % 

99.57  % 

Table  II:  Approximation  ratio  of  BP  without  damping,  BP  with  damping,  BP  with  damping  only  for 
first  50  iterations,  and  BP  with  damping  for  last  50  iterations.  We  introduce  a  small  noise  to  the  edge 
weights  and  set  the  initial  message  by  □til  =  □□□  A 

Now  we  describe  the  implementation,  mostly  parallelization,  of  our  framework.  First,  we  introduce 
asynchronous  message  update  that  enables  efficient  parallelization  of  BP  message  passing.  Second, 
we  illustrate  the  issues  in  parallelizing  post-processing.  Finally,  we  describe  the  parallel 
implementations  of  our  algorithm  and  their  benefits. 

Asynchronous  Message  Update.  For  parallelization,  we  first  divide  the  graph  by  partitioning  the 
vertices,  and  assign  each  partition  to  a  single  thread.  However,  if  we  naively  parallelize  the  process 
using  multiple  threads,  frequent  synchronization  may  incur  large  overhead.  Thus,  we  apply 
asynchronous  message  update  where  each  vertex  updates  the  message  value  right  after  new  message 
value  is  calculated  and  eliminate  synchronization  point  between  iterations.  This  makes  the  process 
faster  because  of  the  reduced  synchronization  points.  Figure  5  shows  that  performance  improvement 
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(speed  up  in  running  time)  of  asynchronous  update  over  synchronous  is  up  to  237%  in  our  example 
graph  for  the  maximum  weight  matching  problem  with  16  threads 
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Figure  5:  Average  running  time  of  our  BP-based  algorithm  with  synchronous  message  update  and 
asynchronous  message  update. 


Local  Post-Processing.  The  second  phase  of  our  algorithm  runs  existing  heuristics  for  post-processing 
to  enforce  the  feasibility  of  BP  decisions.  While  the  framework  works  with  any  heuristics-based  post¬ 
processing  methods,  for  the  entire  process  to  be  parallel,  it  is  important  that  the  post-processing  step  is 
also  parallel.  An  important  criterion  for  efficient  parallelization  is  locality  of  computation;  if  the  post¬ 
processing  heuristics  can  compute  the  result  locally  without  requiring  global  knowledge,  they  can  be 
easily  parallelized.  Moreover,  if  they  do  not  require  synchronization,  the  running  time  can  be  further 
reduced. 


Parallel  Implementation.  The  BP  algorithm  is  easy  to  parallelize  because  of  its  message  passing 
nature.  To  demonstrate  this,  we  parallelize  our  BP-based  framework  using  three  platforms:  GraphChi, 
OpenMP  and  pthread. 

Now  we  show  the  empirical  performance  of  our  framework  using  three  popular  combinatorial 
optimization  problems:  maximum  weight  matching,  minimum  weight  vertex  cover  (MWVC)  and 
maximum  weight  independent  set  problem  (MWIS).  We  already  introduced  the  IP  formulation  of  the 
maximum  weight  matching  (MWM),  where  those  of  the  minimum  weight  vertex  cover  (MWVC)  and 
maximum  weight  independent  set  problem  (MWIS)  are  as  follows: 

MWVC:  minimize  w  *  x 

subject  to  xu  +  xv  >1.  Ve  —  (u.  v)  £  E 

x=[xv}€{0;  l}\y\ 

MWIS:  maximize  w  *  x 

subject  to  xu  +  xv  <  1,  Ve  —  (u|,t>)  £  E 


Experimental  Setup.  In  our  experiments,  both  real-world  and  synthetic  datasets  are  used  for 
evaluation.  For  MWM,  we  used  data-sets  from  the  university  of  Florida  sparse  matrix  collection.  For 
larger  scale  synthetic  evaluation,  we  generate  Erdos-Renyi  random  graphs  (up  to  50  million  vertices 
with  2.5  billion  edges)  with  average  vertex  degree  of  100  with  edge  weights  drawn  independently 
from  the  uniform  random  distribution  over  the  interval  [0,  1].  For  MWVC  and  MWIS,  we  use  the  frb- 
series  from  BHOSLIB,  where  it  also  contains  the  optimal  solutions.  We  note  that  we  perform  no 
experiment  using  synthetic  data-sets  for  MWVC  and  MWIS  since  they  are  NP-hard  problems,  i.e., 
impossible  to  compute  the  optimal  solutions.  On  the  other  hand,  for  MWM  the  Edmonds’  Blossom 
algorithm  can  compute  the  optimal  solution  in  polynomial  time.  All  experiments  in  this  section  are 
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conducted  on  a  machine  with  Intel  Xeon(R)  CPU  E5-2690  @  2.90GHz  with  8  cores  and  8 
hyperthreads  with  128GB  of  memory,  unless  otherwise  noted. 


Approximation  Ratio.  We  now  demonstrate  our  BP-based  approximation  algorithm  produces  highly 
accurate  results.  In  particular,  we  show  that  our  BP-based  algorithms  outperform  well-known 
heuristics  for  MWVC,  MWIS  and  closely  approximate  exact  solutions  for  MWM  for  all  cases  we 
evaluate. 

For  MWM,  we  compare  the  approximation  qualities  of  serial,  synchronous  BP  and  parallel, 
asynchronous  implementation  on  both  synthetic  and  real-world  data-sets,  where  we  compute 
the  optimal  solution  using  the  Blossom  algorithm  to  measure  the  approximation  ratios.  Table 
III  summarize  our  experimental  results  for  MWM  for  the  synthetic  data-sets  and  the  Florida 
data.  Our  BP-based  algorithm  achieves  99%  to  99.9%  approximation  ratios. 


A  p  p  rox  i  m  at  i  on  Ratio 

Serial  BP 
(1  thread) 

Parallel  BP 
(16  threads) 

Synthetic 

#  vertices 
(#  edges) 

500k  (25M) 
1M  (50M) 

2M  (I0OM) 
5M  (250M) 

99.93  % 

99.93  % 

99.94  % 
99.93  % 

99.90  % 

99.90  % 

99.91  % 
99.90  % 

apache l 

100.0  % 

100.0  % 

Real -work! 

apachc2 

100.0  % 

100.0  % 

Data-set 

ecology  2 

100.0  % 

100.0  % 

(Florida) 

G3_circuit 

99.95  % 

99.95  % 

boneOlO 

99.11  % 

99.12  % 

Table  III:  MWM:  Approximation  ratio  of  our  BP-based  algorithm  on  synthetic  and  sparse  matrix 
collection  data-sets 

For  MWVC,  we  use  two  post-processing  procedures:  greedy  and  2-approximation  algorithm. 
For  the  local  greedy  algorithm,  we  choose  a  random  edge  and  add  one  of  its  adjacent  vertices 
with  a  smaller  weight  until  all  edges  are  covered.  We  compare  the  approximation  qualities  of 
our  BP-based  algorithm  compared  to  the  cases  when  one  uses  only  the  greedy  algorithm  and 
the  2-approximation  algorithm.  Figure  6  shows  the  experimental  results  for  the  two  post¬ 
processing  heuristics.  The  results  show  that  our  BP-based  weight  transformation  enhances  the 
approximation  quality  of  known  approximation  heuristics  by  up  to  43%. 
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Figure  6:  MWVC:  Average  approximation  ratio  of  our  BP-based  algorithm,  the  2-approximation 
algorithm  and  the  greedy  algorithm  on  frb-series  data-sets. 


For  MWIS,  the  experiment  was  performed  on  frb-series  data-sets.  We  use  a  greedy  algorithm 
as  the  post-processing  procedure,  which  selects  vertices  in  the  order  of  higher  weights  until  no 
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vertex  can  be  selected  without  violating  the  independent  set  constraint.  We  compare  the 
approximation  qualities  of  our  BP-based  algorithm  and  the  standard  greedy  algorithm.  Figure 
7  shows  that  our  BP-based  framework  enhances  the  approximation  ratio  of  the  solution  by  2% 
to  23%. 


frb-30-15  frb-45-21  frb-53-24  frb-59-26 

Data  Sets 


Figure  7:  MWIS:  Average  approximation  ratio  of  our  BP-based  algorithm  and  the  greedy  algorithm 
on  frb-series  data-sets. 

Parallelization  Speed-up.  Figure  8  compares  the  running  time  of  the  Blossom  algorithm  and  our  BP- 
based  algorithm  with  1  single  core  and  16  cores.  With  five  million  vertices,  our  asynchronous  parallel 
implementation  is  eight  times  faster  than  the  synchronous  serial  implementation,  while  still  retaining 
99.9%  approximation  ratio  as  reported  in  Table  III.  To  demonstrate  the  overall  benefit  in  context,  we 
compare  its  running  time  with  that  of  the  current  fastest  implementation  of  the  Blossom  algorithm  due 
to  Kolmogorov.  Here,  we  note  that  the  Blossom  algorithm  is  inherently  not  easy  to  parallelize.  For 
parallel  implementation,  we  report  results  for  our  pthread  implementation,  but  the  OpenMP 
implementation  also  show  comparable  performance.  For  20  million  vertices  (one  billion  edges),  it 
shows  that  the  running  time  of  our  algorithm  can  be  accelerated  by  up  to  71  times  than  the  Blossom 
algorithm,  while  sacrificing  0.1%  of  accuracy.  The  running  time  gap  is  expected  be  more  significant 
for  larger  graphs  since  the  running  times  of  our  algorithm  and  the  Blossom  algorithm  are  linear  and 
cubic  with  respect  to  the  number  of  vertices,  respectively. 
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Figure  8:  MWM:  Running  time  of  Blossom  algorithm  and  our  BP-based  algorithms. 

Large-scale  Optimization.  Our  algorithm  can  also  handle  large-scale  instances  because  it  is  based  on 
GMs  that  inherently  lend  itself  to  parallel  and  distributed  implementations.  To  demonstrate  this,  we 
create  a  large-scale  instance  containing  up  to  50  million  vertices  and  2.5  billion  edges.  We  experiment 
our  algorithm  using  GraphChi  on  a  single  consumer  level  machine  with  i7  CPU  and  24GB  of 
memory.  Figure  9  shows  the  running  time  and  memory  usage  of  our  algorithm  for  MWM  and  MWVC 
on  large  data-sets. 
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Figure  9:  MWM  and  MWVC:  Running  time  and  memory  usage  of  GraphChi-based  implementation 
on  large-scale  graphs. 
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Max-Product  Belief  Propagation  for  Linear  Programming 
Applications  to  Combinatorial  Optimization 
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Abstract 

Max-product  belief  propagation  (BP)  is  a  pop¬ 
ular  message-passing  algorithm  for  computing 
a  maximum- a-posteriori  (MAP)  assignment  in 
a  joint  distribution  represented  by  a  graphical 
model  (GM).  It  has  been  shown  that  BP  can 
solve  a  few  classes  of  Linear  Programming  (LP) 
formulations  to  combinatorial  optimization  prob¬ 
lems  including  maximum  weight  matching  and 
shortest  path,  i.e.,  BP  can  be  a  distributed  solver 
for  certain  LPs.  However,  those  LPs  and  corre¬ 
sponding  BP  analysis  are  very  sensitive  to  under¬ 
lying  problem  setups,  and  it  has  been  not  clear 
what  extent  these  results  can  be  generalized  to. 

In  this  paper,  we  obtain  a  generic  criteria  that  BP 
converges  to  the  optimal  solution  of  given  LP, 
and  show  that  it  is  satisfied  in  LP  formulations 
associated  to  many  classical  combinatorial  op¬ 
timization  problems  including  maximum  weight 
perfect  matching,  shortest  path,  traveling  sales¬ 
man,  cycle  packing  and  vertex  cover.  More  im¬ 
portantly,  our  criteria  can  guide  the  BP  design 
to  compute  fractional  LP  solutions,  while  most 
prior  results  focus  on  integral  ones.  Our  results 
provide  new  tools  on  BP  analysis  and  new  direc¬ 
tions  on  efficient  solvers  for  large-scale  LPs. 

1  INTRODUCTION 

Graphical  model  (GM)  has  been  one  of  powerful 
paradigms  for  succinct  representations  of  joint  probability 
distributions  in  variety  of  scientific  fields  (Yedidia  et  al., 
2005;  Richardson  and  Urbanke,  2008;  Mezard  and  Mon- 
tanari,  2009;  Wainwright  and  Jordan,  2008).  GM  repre¬ 
sents  a  joint  distribution  of  some  random  vector  to  a  graph 
structured  model  where  each  vertex  corresponds  to  a  ran¬ 
dom  variable  and  each  edge  captures  to  a  conditional  de¬ 
pendency  between  random  variables.  In  many  applications 
involving  GMs,  finding  maximum-a-posteriori  (MAP)  as¬ 
signment  in  GM  is  an  important  inference  task,  which  is 
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known  to  be  computationally  intractable  (i.e.,  NP-hard)  in 
general  (Chandrasekaran  et  al.,  2008).  Max-product  belief 
propagation  (BP)  is  the  most  popular  heuristic  for  approxi¬ 
mating  a  MAP  assignment  of  given  GM,  where  its  perfor¬ 
mance  has  been  not  well  understood  in  loopy  GMs.  Nev¬ 
ertheless,  BP  often  shows  remarkable  performances  even 
on  loopy  GM.  Distributed  implementation,  associated  ease 
of  programming  and  strong  parallelization  potential  are 
the  main  reasons  for  the  growing  popularity  of  the  BP  al¬ 
gorithm.  Lor  example,  several  software  architectures  for 
implementing  parallel  BPs  were  recently  proposed  (Low 
et  al.,  2010;  Gonzalez  et  al.,  2010;  Ma  et  al.,  2012)  by  dif¬ 
ferent  research  groups  in  machine  learning  communities. 

In  the  past  years,  there  have  been  made  extensive  research 
efforts  to  understand  BP  performances  on  loopy  GMs  be¬ 
hind  its  empirical  success.  Several  characterizations  of  the 
max-product  BP  fixed  points  have  been  proposed  (Weiss 
and  Lreeman,  2001;  Vinyals  et  al.,  2010),  whereas  they  do 
not  guarantee  the  BP  convergence  in  general.  It  has  also 
been  studied  about  the  BP  convergence  to  the  correct  an¬ 
swer,  in  particular,  under  a  few  classes  of  loopy  GM  formu¬ 
lations  of  combinatorial  optimization  problems:  matching 
(Bayati  et  al.,  2005;  Sanghavi  et  al.,  2011;  Huang  and  Je- 
bara,  2007;  Salez  and  Shah,  2009),  perfect  matching  (Bay¬ 
ati  et  al.,  2011),  matching  with  odd  cycles  (Shin  et  al., 
2013)  and  shortest  path  (Ruozzi  and  Tatikonda,  2008).  The 
important  common  feature  of  these  instances  is  that  BP 
converges  to  a  correct  MAP  assignment  if  the  Linear  Pro¬ 
gramming  (LP)  relaxation  of  the  MAP  inference  problem 
is  tight,  i.e.,  it  has  no  integrality  gap.  In  other  words,  BP 
can  be  used  an  efficient  distributed  solver  for  those  LPs, 
and  is  presumably  of  better  choice  than  classical  central¬ 
ized  LP  solvers  such  as  simplex  methods  (Dantzig,  1998), 
interior  point  methods  (Thapa,  2003)  and  ellipsoid  methods 
(Khachiyan,  1980)  for  large-scale  inputs.  However,  these 
theoretical  results  on  BP  are  very  sensitive  to  underlying 
structural  properties  depending  on  specific  problems  and  it 
is  not  clear  what  extent  they  can  be  generalized  to,  e.g., 
the  BP  analysis  for  matching  problems  (Bayati  et  al.,  2005; 
Sanghavi  et  al.,  2011;  Huang  and  Jebara,  2007;  Salez  and 
Shah,  2009)  are  not  extended  to  even  for  perfect  matching 


ones  (Bayati  et  al,  201 1).  In  this  paper,  we  overcome  such 
technical  difficulties  for  enhancing  the  power  of  BP  as  a  LP 
solver. 

Contribution.  We  establish  a  generic  criteria  for  GM  for¬ 
mulations  of  given  LP  so  that  BP  converges  to  the  optimal 
LP  solution.  By  product,  it  also  provides  a  sufficient  con¬ 
dition  for  a  unique  BP  fixed  point.  As  one  can  naturally  ex¬ 
pect  given  prior  results,  one  of  our  conditions  requires  the 
LP  tightness.  Our  main  contribution  is  finding  other  suffi¬ 
cient  generic  conditions  so  that  BP  converges  to  the  correct 
MAP  assignment  of  GM.  First  of  all,  our  generic  criteria 
can  rediscover  all  prior  BP  results  on  this  line,  including 
matching  (Bayati  et  al.,  2005;  Sanghavi  et  al.,  2011;  Huang 
and  Jebara,  2007),  perfect  matching  (Bayati  et  al.,  2011), 
matching  with  odd  cycles  (Shin  et  al.,  2013)  and  shortest 
path  (Ruozzi  and  Tatikonda,  2008),  i.e.,  we  provide  a  uni¬ 
fied  framework  on  establishing  the  convergence  and  cor¬ 
rectness  of  BPs  in  relation  to  associated  LPs.  Furthermore, 
we  provide  new  instances  under  our  framework:  we  show 
that  BP  can  solve  LP  formulations  associated  to  other  pop¬ 
ular  combinatorial  optimizations  including  perfect  match¬ 
ing  with  odd  cycles,  traveling  salesman,  cycle  packing  and 
vertex  cover,  which  are  not  known  in  the  literature.  While 
most  prior  known  BP  results  on  this  line  focused  on  the 
case  when  the  associated  LP  has  an  integral  solution,  the 
proposed  criteria  naturally  guides  the  BP  design  to  com¬ 
pute  fractional  LP  solutions  as  well  (see  Section  4.2  and 
Section  4.4  for  details). 

Our  proof  technique  is  built  upon  on  that  of  Sanghavi  et  al. 
(2011)  where  the  authors  construct  an  alternating  path  in 
the  computational  tree  induced  by  BP  to  analyze  its  perfor¬ 
mance  for  the  maximum  weight  matching  problem.  Such 
a  trick  needs  specialized  case  studies  depending  on  the  as¬ 
sociated  LP  when  the  path  reaches  a  leaf  of  the  tree,  and 
this  is  one  of  main  reasons  why  it  is  not  easy  to  generalize 
to  other  problems  beyond  matching.  The  main  technical 
contribution  of  this  paper  is  providing  a  way  to  avoid  the 
issue  in  the  BP  analysis  via  carefully  analyzing  associated 
LP  poly  topes. 

The  main  appeals  of  our  results  are  providing  not  only 
tools  on  BP  analysis,  but  also  guidelines  on  BP  design  for 
its  high  performance,  i.e.,  one  can  carefully  design  a  BP 
given  LP  so  that  it  satisfies  the  proposed  criteria.  We  run 
such  a  BP  for  solving  the  famous  traveling  saleman  prob¬ 
lem  (TSP),  and  our  experiments  show  that  BP  outperforms 
other  popular  heuristics  (see  Section  5).  Our  results  provide 
not  only  new  tools  on  BP  analysis  and  design,  but  also  new 
directions  on  efficient  distributed  (and  parallel)  solvers  for 
large-scale  LPs  and  combinatorial  optimization  problems. 

Organization.  In  Section  2,  we  introduce  necessary  back¬ 
grounds  for  the  BP  algorithm.  In  Section  3,  we  provide 
the  main  result  of  the  paper,  and  several  concrete  applica¬ 
tions  to  popular  combinatorial  optimizations  are  described 


in  Section  4.  In  Section  5,  we  show  empirical  performances 
of  BP  algorithms  for  solving  TSP. 

2  PRELIMINARIES 

2.1  GRAPHICAL  MODEL 

A  joint  distribution  of  n  (binary)  random  variables  Z  = 
[Zi\  G  {0,  l}n  is  called  a  Graphical  Model  (GM)  if  it  fac¬ 
torizes  as  follows:  for  2  =  [zi]  G  {0,  l}n, 

Pv[Z  =  z]  (X  P[  Ipi(Zi)  If  1pa(za), 

where  are  (given)  non-negative  functions,  so- 

called  factors;  F  is  a  collection  of  subsets 

F  =  {ax,  a2, ak}  C  2{1>2’-’n} 

(each  dj  is  a  subset  of  {1,  2, . . . ,  n}  with  K  l  >  2);  Z* 
is  the  projection  of  z  onto  dimensions  included  in  a.1  In 
particular,  ^  is  called  a  variable  factor.  Figure  1  depicts 
the  the  graphical  relation  between  factors  F  and  variables 
z. 


Figure  1:  Factor  graph  for  the  graphical  model 

Pl[z\  (X  £3)^2  (21,22,  Z^as  (>2,  Z3,  Z4),  he., 

F  =  {(^1,(^2,  <^3}  and  n  =  4.  Each  ay  selects  a  subset 
of  z.  For  example,  a\  selects  {z  1,  Z3}. 

Assignment  z*  is  called  a  maximum-a-posteriori  (MAP) 
assignment  if  z*  =  arg  max^e{0,i}ri  P*[z\.  This  means 
that  computing  a  MAP  assignment  requires  us  to  compare 
Pt[z]  for  all  possible  z,  which  is  typically  computation¬ 
ally  intractable  (i.e.,  NP-hard)  unless  the  induced  bipartite 
graph  of  factors  F  and  variables  z,  so-called  factor  graph, 
has  a  bounded  treewidth  (Chandrasekaran  et  al.,  2008). 

2.2  MAX-PRODUCT  BELIEF  PROPAGATION 

The  (max-product)  BP  algorithm  is  a  popular  heuristic  for 
approximating  the  MAP  assignment  in  GM.  BP  is  imple¬ 
mented  iteratively;  at  each  iteration  t,  BP  maintains  four 
messages  {ra^_^(c) ,  mj_>a(c)  :  c  G  {0, 1}}  between 
every  variable  z^  and  every  associated  a  G  Fj,  where 
Fi  :=  {a  G  F  :  i  G  a};  that  is,  Fi  is  a  subset  of  F 
such  that  all  a  in  Fi  are  associated  with  Z{.  The  messages 

^or  example,  if  z  =  [0, 1,0]  and  a  =  {1,3},  then  = 

[0,0]. 


are  updated  as  follows: 

"47(c)  =  maxcipa(za)  JJ  m'^zj)  (1) 

jea\i 

miia(c)  =  V’i(c)  P[  "4'-»(c)-  (2) 

ot'  EFi\a. 


for  some  matrices  Aa ,  Ca  and  vectors  ba,da.  Now  we 
consider  the  Linear  Programming  (LP)  corresponding  the 
above  GM: 

minimize  w  •  x 

subject  to  =  1,  Va  E  F  (5) 


Where  each  ^  only  sends  messages  to  T};  that  is,  ^  sends 
messages  to  ay  only  if  ay  selects/includes  i.  The  outer- 
term  in  the  message  computation  (1)  is  maximized  over  all 
possible  G  {0, 1}^  with  zi  =  c.  The  inner-term  is  a 
product  that  only  depends  on  the  variables  Zj  (excluding 
Zi)  that  are  connected  to  a.  The  message-update  (2)  from 
variable  zi  to  factor  fi>a  is  a  product  containing  all  messages 
received  by  Zi  in  the  previous  iteration,  except  for  the  mes¬ 
sage  sent  by  itself. 


x  =  [Xi\  £  [0,  l]ra. 

One  can  easily  observe  that  the  MAP  assignments  for  GM 
(4)  corresponds  to  the  (optimal)  solution  of  LP  (5)  if  the 
LP  has  an  integral  solution  x*  G  {0, l}n.  As  stated  in  the 
following  theorem,  we  establish  other  sufficient  conditions 
so  that  the  max-product  BP  can  indeed  find  the  LP  solution. 

Theorem  1  The  max-product  BP  on  GM  (4)  with  arbitrary 
initial  message  converges  to  the  solution  of  LP  (5)  if  the 
following  conditions  hold: 


One  can  reduce  the  complexity  by  combining  (1)  and  (2) 
as: 

mW  (c)  ss  ipi(c)  TT  max  ipa'(za-) 

■L-1-  zar.Zi=c 
oc'  £Fi\ot 

X  JJ  m)^a,(Zj). 

jEot'\i 

The  BP  fixed-point  of  messages  is  defined  as  mt+1  =  m1 
under  the  above  updating  rule.  Given  a  set  of  messages 
{rrii^a(c),ma^i(c)  :  c  G  {0,1}},  the  so-called  BP 
marginal  beliefs  are  computed  as  follows: 

bi[zi]  =  ipi{zi)  riaeFi  ma-ti(zi).  (3) 


Cl.  LP  (5)  has  a  unique  integral  solution  x*  G  {0,  l}n, 
i.e.,  it  is  tight. 

C2.  For  every  i  G  {1,  2, . . . ,  n},  the  number  of  factors  as¬ 
sociated  with  Xi  is  at  most  two,  i.e.,  \Fi\  <  2. 

C3.  For  every  factor  fi>a,  every  xa  G  {0,  l}lal  with 
^a(ia)  =  1,  and  every  i  G  a  with  x$  x*,  there 
exists  7  C  a  such  that 

\{j  €  {*}U7:  \Fj\  =  2}|  <  2 


This  BP  algorithm  outputs  z 
/ 

i 

z?P=  ? 

0 


BP  =  [zfp]  where 

if  bi [1]  >  bi[Q] 
if  bi[l]  =  h[ 0]  . 
if  bi  [1]  <  bi[ 0] 


ipa(x'a)  =  1, 
i’cix'a)  =  1, 


where  x'k  = 


where  xk  = 


xk  ifk£{i}  Uy 
xk  otherwise 

xk  if  k  G  {i}  U  7 
xk  otherwise 


It  is  known  that  zBP  converges  to  a  MAP  assignment  after 
a  sufficient  number  of  iterations,  if  the  factor  graph  is  a 
tree  and  the  MAP  assignment  is  unique.  However,  if  the 
graph  contains  cycles,  the  BP  algorithm  is  not  guaranteed 
to  converge  a  MAP  assignment  in  general. 

3  CONVERGENCE  AND  CORRECTNESS 
OF  BELIEF  PROPAGATION 

In  this  section,  we  provide  the  main  result  of  this  paper: 
a  convergence  and  correctness  criteria  of  BP.  Consider  the 
following  GM:  for  x  =  [ay]  G  {0,  l}n  and  w  =  [wf[  G  Mn, 

Yi[X  =  x\  OC  Y[e~WiXi  II  V’a(^a),  (4) 

i  aEF 

where  F  is  the  set  of  non-variable  factors  and  the  factor 
function  for  a  G  F  is  defined  as 

1  if  Aaxa  >  ba,  Caxa  =  da 

0  otherwise 


Since  Theorem  1  holds  for  arbitrary  initial  messages,  the 
conditions  Cl,  C2,  C3  also  provides  the  uniqueness  of  BP 
fixed-points  in  term  of  marginal  beliefs,  as  follows. 

Corollary  2  The  BP  fixed-points  of  GM  (4)  have  the  same 
marginal  beliefs  if  conditions  Cl,  C2,  C3  hold. 

The  conditions  C2,  C3  are  typically  easy  to  check  given 
GM  (4)  and  the  uniqueness  in  Cl  can  be  easily  guaran¬ 
teed  via  adding  random  noises,  where  we  provide  several 
concrete  examples  in  Section  4.  On  the  other  hand,  the  in¬ 
tegral  property  in  Cl  requires  to  analyze  LP  (5),  where  it 
has  been  extensively  studied  in  the  field  of  combinatorial 
optimization  (Schrijver,  2003).  Nevertheless,  Theorem  1 
provides  important  guidelines  to  design  BP  algorithms,  ir¬ 
respectively  of  the  LP  analysis.  For  example,  in  Section 
5,  we  report  empirical  performances  of  BP  following  the 
above  guideline  for  solving  the  traveling  salesman  prob¬ 
lem,  without  relying  on  whether  the  corresponding  LP  has 
an  integral  solution  or  not. 


3.1  PROOF  OF  THEOREM  1 


4.  Repeat  Step  2,  3  t  times. 


To  begin  with,  we  define  some  necessary  notation.  We  let 
V  denote  the  poly  tope  of  feasible  solutions  of  LP  (5): 

T:={xe  [0, 1]"  :  i/,a(xa)  =  1,  Va  €  F}  . 

Similarly,  Va  is  defined  as 

Va  ■=  f  €  [0,  l]1"1  :  ipa(xa)  =  1  j  . 

We  first  state  the  following  key  technical  lemma. 

Lemma  3  There  exist  universal  constants  K,rj  >  OforLP 
(5)  such  that  if  z  G  [0,  l]n  and  0  <  5  <  p  satisfy  the 
followings: 


1.  There  exist  at  most  two  violated  factors  for  z ,  i.e., 

|{a  G  F  :  ^  ^a}|  <  2. 

2.  For  each  violated  factor  a,  there  exist  i  G  a  such  that 
z t  G  where  =  z  +  £Ci  or  z^  =  z  —  eei  and 
ei  G  {0,1  }n  is  the  unit  vector  whose  i-th  coordinate 
is  1, 


then  there  exists  z$  G  V  such  that  \\z  —  z*\\i  <  eK. 


The  proof  of  Lemma  3  is  presented  in  Section  3.2.  Now, 
from  Condition  Cl,  it  follows  that  there  exists  p  >  0  such 


that 


p  :=  inf 

xEV\x * 


W  '  X  —  W  '  X* 
\\x  —  X*  111 


>  0. 


(6) 


We  let  xl  G  {0, 1,  ?}n  denote  the  BP  estimate  at  the  t- 
th  iteration  for  the  MAP  computation.  We  will  show  that 
under  Conditions  C1-C3 , 


xl  =  x* ,  fort  >  ^  ^max  _|_  j{^ 

where  rcmax  =  maxj  \wf  and  K  is  the  universal  con¬ 
stant  in  Lemma  3.  Suppose  the  above  statement  is  false, 

i.e.,  there  exists  i  G  {1,2,. . . ,  n}  such  that  x\  x\  for 

t  >  ^  w™ax  +  1^  K.  Under  the  assumption,  we  will  reach 
a  contradiction. 


Now  we  construct  a  tree- structured  GM  Ti(t),  popularly 
known  as  the  computational  tree  (Weiss  and  Freeman, 
2001),  as  follows: 


Suppose  the  initial  messages  of  BP  are  set  by  1,  i.e., 
rrij^af)0  =  1.  Then,  if  x *  7^  x\,  it  is  known  (Weiss, 
1997)  that  there  exists  a  MAP  configuration  yMAP  on  Ti  (t) 
with  y^AP  x\  at  the  root  variable.  For  other  initial  mes¬ 
sages,  one  can  guarantee  the  same  property  under  changing 
weights  of  leaf  variables  of  the  tree- structured  GM.  Specif¬ 
ically,  for  a  leaf  variable  k  with  | =  {07 ,  0^2  }  |  =  2  and 
ai  being  its  parent  factor  in  Ti  (t),  we  reset  its  variable  fac¬ 
tor  by  e~w^Vk ,  where 

,  =  max^2:gfc=lV,a2(^a2)njga2\feTng_>ct2(2:J-) 

wk  wk  og  maxZa2:Zk=oipa2(za2)Hjea2\km>j^>,a2(zj) ' 

(7) 

This  is  the  reason  why  our  proof  of  Theorem  1  goes  through 
for  arbitrary  initial  messages.  For  notational  convenience, 
we  present  the  proof  for  the  standard  initial  message  of 
=  1,  where  it  can  be  naturally  generalized  to 
other  initial  messages  using  (7). 

Now  we  construct  a  new  valid  assignment  yNEW  on  the 
computational  tree  Ti(t)  as  follows: 

1.  Initially,  set  yNEW  4—  yMAP . 

2.  Update  the  value  of  the  root  variable  of  Tiff)  by 

y?EW^x*. 

3.  For  each  child  factor  of  root  iGa,  choose  7  C  a 
according  to  Condition  C3  and  update  the  associated 
variable  by  yPIEW  <—  x*  Vj  G  7. 

4.  Repeat  Step  2,3  recursively  by  substituting  Ti(t )  by 
the  subtree  of  Tift)  of  root  j  G  7  until  the  process 
stops  (i.e.,  i  =  j )  or  the  leaf  of  Tift)  is  reached  (i.e.,  i 
does  not  have  a  child). 

One  can  notice  that  the  set  of  revised  variables  in  Step  2  of 
the  above  procedure  forms  a  path  structure  Q  in  the  tree- 
structured  GM.  We  first,  consider  the  case  that  both  ends 
of  the  path  Q  touch  leaves  of  Tift),  where  other  cases  can 
be  argued  in  a  similar  manner.  Define  Q  and  be  the 
number  of  copies  of  Xj  in  path  Q  with  x*j  =  1  and  x*  =  0, 
respectively,  where  C  =  [CjL  ^  =  [Kj\  ^  •  Then,  from 

our  construction  of  yNEW ,  one  can  observe  that 


1.  Add  yi  G  {0, 1}  as  the  root  variable  with  variable  fac¬ 
tor  function  e~WiVi . 

2.  For  each  leaf  variable  y3  and  for  each  a  G  Fj  and 

is  not  associated  with  yj  in  the  current  tree- structured 
GM,  add  a  factor  function  as  a  child  of  yj . 

3.  For  each  leaf  factor  and  for  each  variable  y\~  such 
that  k  G  a  and  y &  is  not  associated  with  f>a  in  the  cur¬ 
rent  tree-structured  GM,  add  a  variable  y^  as  a  child 
of  with  variable  factor  function  e~WkVk . 


yNEW  =  yMAP  +  C-« 


w  •  y 


MAP 


NEW 


=  w  •  (k  —  C). 


If  we  set  z  =  x*  -\-  sfn  —  ()  where  0  <  e  <  min{l/2t,  7}, 
then  one  can  check  that  2  satisfies  the  conditions  of  Lemma 
3  using  Conditions  C2,  C3.  Hence,  from  Lemma  3,  there 
exists  z$  G  V  such  that 


Wz'-zW^KeK 

lkt-*li>£(||CI|i  +  ll«l|i-A')>e(i-A'). 


where  2  =  x*  +  e{n  —  £).  Hence,  it  follows  that 


0<p< 

< 


w  -  —  w  -  X* 

M-alli 

W  •  Z  +  £WmaxK  —  w  -  X* 

e(*  -  if) 

£W  •  (/“£  C)  “I-  ^max-^ 
e{t  -  K ) 

^  '  (a€  C)  +  ^max-^ 


Furthermore,  if  £  >  ^  Wr^ax  +  1^  if,  the  above  inequality 
implies  that 

w  •  yMAP  —  u;  •  yNEW  =  w  .  (k  —  Q 

>  pt~  Omax  +  p)if  >  0. 

This  contradicts  to  the  fact  that  yMAP  is  a  MAP  configura¬ 
tion.  This  completes  the  proof  of  Theorem  1. 

3.2  PROOF  OF  LEMMA  3 

One  can  write  X  =  {x  :  Ax  >  6}  C  [0,  l]n  for  some 
matrix  A  E  Mmxn  and  vector  b  E  Mm,  where  without  loss 
of  generality,  we  can  assume  that  \\AiW2  =  1  where  {A^} 
is  the  set  of  row  vectors  of  A.  We  define 


X£  =  {x  :  Ax  >b  —  el}, 

where  1  is  the  vector  of  ones.  Then,  one  can  check  that 
z  £  V£  for  z,  e  satisfying  conditions  of  Lemma  3.  Now  we 
aim  for  finding  a  universal  constant  K  satisfying 

dist(X,  V£)  '=  max(min  \\x  —  y ||  1)  <  sK , 

xev£  yev 

which  leads  to  the  conclusion  of  Lemma  3. 

To  this  end,  for  £  C  [1,  2, . . . ,  m\  with  |£|  =  n,  we  let  A^ 
be  the  square  sub-matrix  of  A  by  choosing  £-th  rows  of  A 
and  b %  is  the  n-dimensional  sub  vector  of  b  corresponding  £. 
Throughout  the  proof,  we  only  consider  £  such  that  A^  is 
invertible.  Using  this  notation,  we  first  claim  the  following. 


From  (8)  and  (9),  there  exists  a  row  vector  Ai  of  A^  and 
the  corresponding  element  bi  of  b %  such  that 

ir(r+(^Y)  y)>bi' 

Using  the  above  inequality  and  A$  •  (Ax  +  (1  —  A )y)  =  bi, 
one  can  conclude  that 

i4i'Gx+(i-'i)  y)<bh 

which  contradict  to  |x  +  (l  —  ^)  y  E  V.  This  completes 
the  proof  of  Claim  4.  □ 

We  also  note  that  if  v  is  a  vertex  of  poly  tope  V,  there  exists 
£  such  that  A^  is  invertible  and  v  =  A^-1^.  We  define  the 
following  notation: 

X  =  {£  :  A~\  E  V}  Is  =  {£  :  A~\b^  -  el)  E  XJ, 

where  Claim  4  implies  that  { v %  :=  A^-1^  :  £  E  X}  and 
:=  A^“1(6^  —  el)  :  £  E  X£}  are  sets  of  vertices  of 
V  and  X£,  respectively.  Using  the  notation,  we  show  the 
following  claim. 

Claim  5  There  exists  p  >  0  such  that  I£  cX  /hr  a//  6  E 
(0,??). 

Proof.  Suppose  p  >  0  satisfying  the  conclusion  of  Claim 
5  does  not  exist.  Then,  there  exists  a  strictly  decreasing 
sequence  {ek  >  0  :  k  =  1,2, . . .}  converges  to  0  such  that 
X£k  —  X  7^  0.  Since  |{£  :  £  C  [1,  2, . . . ,  m\}\  <  00,  there 
exists  £'  such  that 

| JC:={k  :  £'  E  l£k  —  X}  |  =00.  (10) 

For  any  k  E  /C,  observe  that  the  sequence  {u^j££  :  ^  > 
k,i  E  JC}  converges  to  vg.  Furthermore,  all  points  in  the 
sequence  are  in  V£k  since  V££  C  X£fc  for  any  i>k.  There¬ 
fore,  one  can  conclude  that  vg  E  V£k  for  all  k  E  1C,  where 
we  additionally  use  the  fact  that  V£k  is  a  closed  set.  Be¬ 
cause  V  =  r\keJCV£k’  A  must  be  that  E  X,  i.e., 
must  be  a  vertex  of  V  from  Claim  4.  This  contradicts  to  the 
fact  £'  ^  X.  This  completes  the  proof  of  Claim  5.  □ 


Claim  4  If  A^  is  invertible  and  v £  :=  A^  1b^  E  V,  then  v £ 
is  a  vertex  of  poly  tope  V. 

Proof.  Suppose  v £  is  not  a  vertex  of  X,  i.e.  there  exist 
x,  y  E  X  such  that  x  /  y  and  =  Ax  +  (1  —  A )y  for 
some  A  E  (0, 1/2].  Under  the  assumption,  we  will  reach  a 
contradiction.  Since  X  is  a  convex  set, 

ii+H)sep'  (8) 

However,  as  A^  is  invertible, 


From  the  above  claim,  we  observe  that  any  xEXe  can  be 
expressed  as  a  convex  combination  of  {u^£  •  £  E  X},  i.e., 
x  =  with  \  =  1  and  A^  >  0.  For  all 

5  E  (0,  p)  for  p  >  0  in  Claim  5,  one  can  conclude  that 


dist(X,  V£)  <  max  II  -X  Vdli 

£  £eX  £eX 


=  max  5 1| 


«€X 


<  emax  ||A^  1l||i, 


where  we  choose  Ff  =  max^  ||A^  1 1 1|  1 .  This  completes 
the  proof  of  Lemma  3. 


4  APPLICATIONS  OF  THEOREM  1  TO 
COMBINATORIAL  OPTIMIZATION 


The  uniqueness  condition  in  the  above  corollary  is  easy  to 
guarantee  by  adding  small  random  noises  to  edge  weights. 


In  this  section,  we  introduce  concrete  instances  of  LPs 
satisfying  the  conditions  of  Theorem  1  so  that  BP  cor¬ 
rectly  converges  to  its  optimal  solution.  Specifically,  we 
consider  LP  formulations  associated  to  several  combina¬ 
torial  optimization  problems  including  shortest  path,  max¬ 
imum  weight  perfect  matching,  traveling  salesman,  maxi¬ 
mum  weight  disjoint  vertex  cycle  packing  and  vertex  cover. 
We  note  that  the  shortest  path  result,  Corollary  6,  is  known 
(Ruozzi  and  Tatikonda,  2008),  where  we  rediscover  it  as  a 
corollary  of  Theorem  1 .  Our  other  results,  Corollaries  7-11, 
are  new  and  what  we  first  establish  in  this  paper. 


4.1  SHORTEST  PATH 


Given  directed  graph  G  =  (V,E)  and  non-negative  edge 
weights  w  =  [we  :  e  E  E]  E  ,  the  shortest  path  prob¬ 
lem  is  to  find  the  shortest  path  from  the  source  s  to  the 
destination  t:  it  minimizes  the  sum  of  edge  weights  along 
the  path.  One  can  naturally  design  the  following  LP  for  this 
problem: 


minimize  w  •  x 


subject  to 


E  Xe~  E  Xe 

eE<5°(  v)  eESi(v ) 

{1  if  v  =  s  (11) 

—  1  if  v  =  t  \/  v  e  V 
0  otherwise 


x  =  [xe]  G  [0, 1]|B|. 


where  5l(v),  S°(v)  are  the  set  of  incoming,  outgoing  edges 
of  v.  It  is  known  that  the  above  LP  always  has  an  integral 
solution,  i.e.,  the  shortest  path  from  s  to  t.  We  consider  the 
following  GM  for  LP  (11): 


Pr[X  =  x]  ex  JJ  e~WeXe  H  (12) 

eeE  vev 


where  the  factor  function  ijjv  is  defined  as 

1  if  ^2ee6°(v)  Xe  —  Eee<5*0)  Xe 

[  1  if  v  =  s 


^v(xs(  „))  =  < 


—  1  if  v  =  t 
0  otherwise 


0  otherwise 
v 


For  the  above  GM  (12),  one  can  easily  check  Conditions 
C2,  C3  of  Theorem  1  hold  and  derive  the  following  corol¬ 
lary  whose  formal  proof  is  presented  in  the  supplementary 
material  due  to  the  space  constraint. 

Corollary  6  If  the  shortest  path  from  s  to  t,  i.e.,  the  solu¬ 
tion  of  the  shortest  path  LP  ( 11),  is  unique,  then  the  max- 
product  BP  on  GM  (12)  converges  to  it. 


4.2  MAXIMUM  WEIGHT  PERFECT  MATCHING 

Given  undirected  graph  G  =  (V,E)  and  non-negative  edge 
weights  w  =  [we  :  e  E  E\  E  on  edges,  the  maximum 
weight  perfect  matching  problem  is  to  find  a  set  of  edges 
such  that  each  vertex  is  connected  to  exactly  one  edge  in 
the  set  and  the  sum  of  edge  weights  in  the  set  is  maximized. 
One  can  naturally  design  the  following  LP  for  this  problem: 


maximize  w  •  x 

subject  to  xe  =  1,  Vu  E  V 

eES(v ) 

x  =  [xe]  e  [0,  l]|i?l. 

where  S(v)  is  the  set  of  edges  connected  to  a  vertex  v.  If 
the  above  LP  has  an  integral  solution,  it  corresponds  to  the 
solution  of  the  maximum  weight  perfect  matching  problem. 


It  is  known  that  the  maximum  weight  matching  LP 
(13)  always  has  a  half-integral  solution  x*  E  {0,  1}^. 

We  will  design  BP  for  obtaining  the  half-integral  solution. 
To  this  end,  duplicate  each  edge  e  to  ei,  and  define  a 
new  graph  G'  =  (V,E')  where  E'  =  {e i,e2  :  e  E  E}. 
Then,  we  suggest  the  following  equivalent  LP  that  always 
have  an  integral  solution: 

maximize  wt  •  x 

subject  to  xei  —  2  Vf  E  V 

eiES(v) 

x  =  [xei]  e  [o,  Ip'1. 

where  wr  =  w'e2  =  we.  One  can  easily  observe  that  solv¬ 
ing  LP  (14)  is  equivalent  to  solving  LP  (13)  due  to  our  con¬ 
struction  of  G'  and  w' .  Now,  construct  the  following  GM 
for  LP  (14): 

Pr[X  =  a;]  oc  JJ  ew^Xei  IJ  ipv{xs(v)),  (15) 

ei  EE'  vev 

where  the  factor  function  f>v  is  defined  as 


'Ipv  (*£<5(i>)  ) 


1  if£ei 


eiES(v) 


-  2 


0  otherwise 


For  the  above  GM  (15),  we  derive  the  following  corollary 
of  Theorem  1  whose  formal  proof  is  presented  in  the  sup¬ 
plementary  material  due  to  the  space  constraint. 

Corollary  7  If  the  solution  of  the  maximum  weight  perfect 
matching  LP  (14)  is  unique,  then  the  max-product  BP  on 
GM  (15)  converges  it. 


Again,  the  uniqueness  condition  in  the  above  corollary  is 
easy  to  guarantee  by  adding  small  random  noises  to  edge 
weights  [w'e.\.  We  note  that  it  is  known  (Bayati  et  al.,  2011) 
that  BP  converges  to  the  unique  and  integral  solution  of  LP 
(13),  while  Corollary  7  implies  that  BP  can  solve  it  without 
the  integrality  condition.  We  note  that  one  can  easily  ob¬ 
tain  a  similar  result  for  the  maximum  weight  (non-perfect) 
matching  problem,  where  we  omit  the  details  in  this  paper. 


Now,  we  construct  the  following  GM  from  the  above  LP: 

Pr[y  =  y]  oc  IJ  eWeVe  JJ  ipv{ys{v))  JJ  ipc(ys(vc )), 
eeE  vev  cec 

(18) 

where  the  factor  function  ipc  is  defined  as 


^v(ys(v)) 


^  ^  ^2e£S(v)  y e  1 

0  otherwise 


4.3  MAXIMUM  WEIGHT  PERFECT  MATCHING 
WITH  ODD  CYCLES 


In  previous  section  we  prove  that  BP  converges  to  the  opti¬ 
mal  (possibly,  fractional)  solution  of  LP  (14),  equivalently 
LP  (13).  One  can  add  odd  cycle  (also  called  Blossom)  con¬ 
straints  and  make  those  LPs  tight  i.e.  solves  the  maximum 
weight  perfect  matching  problem: 


maximize 
subject  to 


W  •  X 


*e  =  l,  Mv&V 

e£S(v) 

5>e<^-l,  vCeC, 

eec 

x  =  [xe]  e  [0,1]  1*1. 


(16) 


where  C  is  a  set  of  odd  cycles  in  G.  The  authors  (Shin 
et  al.,  2013)  study  BP  for  solving  LP  (16)  by  replacing 

EeeS(v)  xe  =  1  by  T,eeS(v)  xe  -  1’ i-e-’ for  the  maximum 
weight  (non-perfect)  matching  problem.  Using  Theorem  1, 
one  can  extend  the  result  to  the  maximum  weight  perfect 
matching  problem,  i.e.,  solving  LP  (16).  To  this  end,  we 
follow  the  approach  (Shin  et  al.,  2013)  and  construct  the 
following  graph  G'  =  (V',E')  and  weight  w'  =  [w'e  :  e  G 
E']  G  M}E  I  given  set  C  of  disjoint  odd  cycles: 


v'  =  V  U  {vc  :  C  G  C} 

E'  =  {(u,  vc)  :ueC,C  eC}UE\{eeC  :C  eC} 


1 

2 


= 


E 


e’eE(C) 


(_ l)dc(u,e')Wel 


We 


if  e  =  (u,  vc) 

for  some  C  G  C  , 
otherwise 


where  dc(u,  e')  is  the  graph  distance  between  u,  er  in  cycle 
C.  Then,  LP  (16)  is  equivalent  to  the  following  LP: 

maximize  w'  •  y 

subject  to  ye  =  1,  Wv  G  V 

e£  S(v) 

22  (~l)dcMy{vc  u)  e  [0;2],  Ve  e  E{C) 
uev(C) 


E  Ve<  \C\  -  1,  VC  e  C 

e£S(vc ) 

y=  [ye]  e  [o,  i]|£/|. 

(17) 


c(ys(vc )) 


'1  if  E uevioi-^^Vivcu)  €  {0,2} 

Eee<5(«c)  Ve  <  |C|  -  1 

0  otherwise 


For  the  above  GM  (18),  we  derive  the  following  corollary 
of  Theorem  1  whose  formal  proof  is  presented  in  the  sup¬ 
plementary  material  due  to  the  space  constraint. 

Corollary  8  If  the  solution  of  the  maximum  weight  perfect 
matching  with  odd  cycles  LP  (17)  is  unique  and  integral, 
then  the  max-product  BP  on  GM  (18)  converges  to  it. 


We  again  emphasize  that  a  similar  result  for  the  maximum 
weight  (non-perfect)  matching  problem  was  established  in 
(Shin  et  al.,  2013).  However,  the  proof  technique  in  the 
paper  does  not  extend  to  the  perfect  matching  problem. 
This  is  in  essence  because  presumably  the  perfect  match¬ 
ing  problem  is  harder  than  the  non-perfect  matching  one. 
Under  the  proposed  generic  criteria  of  Theorem  1,  we  over¬ 
come  the  technical  difficulty. 


4.4  VERTEX  COVER 

Given  undirected  graph  G  =  ( V,E )  and  non-negative  in¬ 
teger  vertex  weights  b  =  [bv  :  v  G  V]  G  z|^,  the  vertex 
cover  problem  is  to  find  a  set  of  vertices  minimizes  the  sum 
of  vertex  weights  in  the  set  such  that  each  edge  is  connected 
to  at  least  one  vertex  in  it.  This  problem  is  one  of  Karp’s 
21  NP-complete  problems  (Karp,  1972).  The  associated  LP 
formulation  to  the  vertex  cover  problem  is  as  follows: 

minimize  b  •  y 

subject  to  Vu  +  Vv>  l,(u,v)eE  (19) 

y  =  [yv]e[  0,1]^. 

However,  if  we  design  a  GM  from  the  above  LP,  it  does  not 
satisfy  conditions  in  Theorem  1 .  Instead,  we  will  show  that 
BP  can  solve  the  following  dual  LP: 

maximize  xe 

e£E 

subject  to  ^  xe<bv,  \/ v  G  V  (20) 

e£d{v ) 

x  =  [xe\  G 

Note  that  the  above  LP  always  has  a  half-integral  solution. 
As  we  did  in  Section  4.2,  one  can  duplicate  edges,  i.e., 


E'  =  {ei, . . .  ,e2bmax  :  e  e  E}  with  6max  =  ma xvbv, 

and  design  the  following  equivalent  LP  having  an  integral 
solution: 

maximize  w'  •  x 

subject  to  x&i  <  VveV 

aeSiv) 

*=[**]  €[0,1]'^ 

where  w'e .  =  rce  for  e  £  E  and  its  copy  e  E' .  From  the 
above  LP,  we  can  construct  the  following  GM: 

Pr[X=x]  (X  n  n  (22) 

G-E1'  veV 

where  the  factor  function  ^ v  is  defined  as 


'Ipv  (x8(v) ) 


^  ^  J2ei£8(v)  Xe%  — 

0  otherwise 


For  the  above  GM  (22),  we  derive  the  following  corollary 
of  Theorem  1  whose  formal  proof  is  presented  in  the  sup¬ 
plementary  material  due  to  the  space  constraint. 

Corollary  9  If  the  solution  of  the  vertex  cover  dual  LP  ( 21) 
is  unique,  then  the  max-product  BP  on  GM  (22)  converges 
it. 


Again,  the  uniqueness  condition  in  the  above  corollary  is 
easy  to  guarantee  by  adding  small  random  noises  to  edge 
weights  [w'e  ].  We  further  remark  that  if  the  solution  of 
the  primal  LP  (19)  is  integral,  then  it  can  be  easily  found 
from  the  solution  of  the  dual  LP  (21)  using  the  strictly  com¬ 
plementary  slackness  condition  (Bertsimas  and  Tsitsiklis, 
1997) . 


4.5  TRAVELING  SALESMAN 


For  the  above  GM  (24),  we  derive  the  following  corollary 
of  Theorem  1  whose  formal  proof  is  presented  in  the  sup¬ 
plementary  material  due  to  the  space  constraint. 

Corollary  10  If  the  solution  of  the  traveling  salesman  LP 
(23)  is  unique  and  integral,  then  the  max-product  BP  on 
GM  (24)  converges  it. 

Again,  the  uniqueness  condition  in  the  above  corollary  is 
easy  to  guarantee  by  adding  small  random  noises  to  edge 
weights.  In  Section  5,  we  show  the  empirical  performance 
of  the  max-product  BP  on  GM  (24)  for  solving  TSP  without 
relying  on  the  integrality  condition  in  Corollary  10. 

4.6  MAXIMUM  WEIGHT  CYCLE  PACKING 

Given  undirected  graph  G  =  (V,E)  and  non-negative  edge 

\E\ 

weights  w  =  [we  :  e  G  E]  G  ,  the  maximum  weight 
vertex  disjoint  cycle  packing  problem  is  to  find  the  maxi¬ 
mum  weight  set  of  cycles  with  no  common  vertex.  It  is  easy 
to  observe  that  it  is  equivalent  to  find  a  subgraph  maximiz¬ 
ing  the  sum  of  edge  weights  on  it  such  that  each  vertex  of 
the  subgraph  has  degree  2  or  0.  The  natural  LP  formulation 
to  this  problem  is  following: 

maximize  w  •  x 

subject  to  ^  Xe  =  2 yv  ^5) 

e£8(y) 

x=[xe]  €  [o,ipi,i/  =  [y„]  e  [0,1]IVL 

From  the  above  LP,  one  can  construct  the  following  GM: 

Pr[X  =  x,Y  =  y]  oc  JJ  eWeXe  JJ  if>v(xS(v),yv), 

e£E  v£V 

(26) 

where  the  factor  function  f>v  is  defined  as 


Given  directed  graph  G  =  (V,E)  and  non-negative  edge 
weights  w  =  [we  :  e  G  E]  G  ,  the  traveling  salesman 
problem  (TSP)  is  to  find  the  minimum  weight  Hamiltonian 
cycle  in  G.  The  natural  LP  formulation  to  TSP  is  following: 

minimize  w  •  x 

subject  to  xe  —  2  ^3^ 

e£S(v ) 

*  =[*e]€  [0,1]^. 

From  the  above  LP,  one  can  construct  the  following  GM: 

Vv[X  =  x\  oc  JJ  e~WeXe  JJ  ipv{xS(v)),  (24) 

e£E  v£V 

where  the  factor  function  f>v  is  defined  as 


'Ipv  ( x8(v )  5  Vv) 


1  if  Eee5(t,)  =  2 yv 

0  otherwise 


For  the  above  GM  (26),  we  derive  the  following  corollary 
of  Theorem  1  whose  formal  proof  is  presented  in  the  sup¬ 
plementary  material  due  to  the  space  constraint. 

Corollary  11  If  the  solution  of  maximum  weight  vertex 
disjoint  cycle  packing  LP  (25)  is  unique  and  integral,  then 
the  max-product  BP  on  GM  (26)  converges  it. 


Again,  the  uniqueness  condition  in  the  above  corollary  is 
easy  to  guarantee  by  adding  small  random  noises  to  edge 
weights. 


5  EXPERIMENTAL  RESULTS  FOR 
TRAVELING  SALESMAN  PROBLEM 


^v{x8(v)) 


1  if  EeeS(v)  se  =  2 

0  otherwise 


In  this  section,  we  report  empirical  performances  of  BP  on 
GM  (24)  for  solving  the  traveling  salesman  problem  (TSP) 


Table  1:  Experimental  results  for  small  size  complete  graph  and  each  number  is  the  average  among  100  samples.  For 
example,  Greedy +BP  means  that  the  Greedy  algorithm  using  edge  weights  as  BP  beliefs  as  we  describe  in  Section  5.  The 
left  value  is  the  approximation  ratio,  i.e.,  the  average  weight  ratio  between  the  heuristic  solution  and  the  exact  solution. 
The  right  value  is  the  average  weight  of  the  heuristic  solutions.  The  last  row  is  a  ratio  of  tight  TSP  LP  (23). 


Size 

5 

10 

15 

20 

25 

Greedy 

1.07/1.84 

1.20/2.25 

1.33/2.58 

1.51/2.85 

1.51/3.04 

Greedy+BP 

1.00/1.75 

1.05/1.98 

1.13/2.23 

1.19/2.27 

1.21/2.43 

Christofides 

1.38/1.85 

1.38/2.56 

1.67/3.20 

1.99/3.75 

2.16/4.32 

Christofides+BP 

1.00/1.75 

1.09/2.07 

1.23/2.43 

1.30/2.50 

1.45/2.90 

Insertion 

1.03/1.79 

1.29/2.38 

1.53/2.95 

1.72/3.26 

1.89/3.77 

Insertion+BP 

1.00/1.75 

1.29/2.39 

1.52/2.97 

1.79/3.38 

1.94/3.89 

N-Neighbor 

1.07/1.84 

1.27/2.39 

1.42/2.74 

1.55/2.96 

1.64/3.30 

N-Neighbor+BP 

1.00/1.75 

1.05/1.98 

1.13/2.23 

1.15/2.21 

1.20/2.40 

2-Opt 

1.00/1.75 

1.08/2.04 

1.12/2.21 

1.24/2.36 

1.28/2.57 

2-Opt+BP 

1.00/1.75 

1.04/1.96 

1.07/2.11 

1.11/2.13 

1.16/2.34 

Tight  LPs 

100% 

93% 

88% 

87% 

84% 

Table  2:  Experimental  results  for  sparse  Erdos-Renyi  graph  with  fixed  average  vertex  degrees  and  each  number  is  the 
average  among  1000  samples.  The  left  value  is  the  ratio  that  a  heuristic  finds  the  Hamiltonian  cycle  without  penalty  edges. 
The  right  value  is  the  average  weight  of  the  heuristic  solutions. 


Size 

100 

200 

Degree 

10 

25 

50 

10 

25 

50 

Greedy 

0%  /  7729.43 

0.3%/ 2841.98 

13%/  1259.08 

0%/ 15619.9 

0%/ 5828.88 

0.3%  /  2766.07 

Greedy+BP 

14%/ 1612.82 

21%/ 1110.27 

44%  /  622.488 

6.4%/ 2314.95 

10%/  1687.29 

16%/ 1198.48 

Christoifeds 

0%  /  19527.3 

0%/ 16114.3 

0%  /  10763.7 

0%/ 41382.5 

0%  /  37297.0 

0%/ 32023.1 

Christofides+BP 

14%/ 2415.73 

20%/  1663.47 

34%  /  965.775 

6.1%/ 3586.77 

9.2%/ 2876.35 

12%/ 2183.80 

Insertion 

0%  /  12739.2 

84%/  198.099 

100%/  14.2655 

0%/ 34801.6 

0.9%/ 3780.71 

99%/ 44.1293 

Insertion+BP 

0%/  13029.0 

76%/ 283.766 

100%/  14.6964 

0%  /  34146.7 

0.3%/ 4349.11 

99%/ 41.2176 

N-Neighbor 

0%/ 9312.77 

0%/ 3385.14 

7.6%/ 1531.83 

0%/  19090.7 

0%/ 7383.23 

0.3%  /  3484.82 

N-Neighbor+BP 

16%  / 1206.95 

26%  /  824.232 

50%  /  509.349 

6.9%/  1782.17 

12%/ 1170.38 

24%  /  888.421 

2-Opt 

34%  /  1078.03 

100%  /  14.6873 

100%/ 7.36289 

2%  /  3522.78 

100%/ 35.8421 

100%/ 18.6147 

2-Opt+BP 

76%/ 293.450 

100%/  13.5773 

100%/ 6.53995 

33%/  1088.79 

100%  /  34.7768 

100%/  17.4883 

Tight  LPs 

62% 

62.3% 

63% 

52.2% 

55% 

52.2% 

that  is  presumably  the  most  famous  one  in  combinatorial 
optimization.  In  particular,  we  design  the  following  BP- 
guided  heuristic  for  solving  TSP: 

1.  Run  BP  for  a  fixed  number  of  iterations,  say  100,  and 
calculate  the  BP  marginal  beliefs  (3). 

2.  Run  the  known  TSP  heuristic  using  edge  weights  as 
log  £jjj  using  BP  margianl  beliefs  instead  of  the  orig¬ 
inal  weights. 

For  TSP  heuristic  in  Step  2,  we  use  Greedy,  Christoifeds, 
Insertion,  N-Neighbor  and  2-Opt  provided  by  the  LEMON 
graph  library  (Dezso  et  al.,  201 1).  We  first  perform  the  ex¬ 
periments  on  the  complete  graphs  of  size  5,  10,  15,  20,  25 
and  random  edge  weight  in  (0, 1)  to  measure  approxima¬ 
tion  qualities  of  heuristics,  where  it  is  reported  in  Table  1 . 
Second,  we  consider  the  sparse  Erdos-Renyi  random  graph 
of  size  100,  200  and  random  edge  weight  in  (0, 1).  Then, 
we  make  it  a  complete  graph  by  adding  non-existing  edges 
with  penalty  edge  weight  1000. 2  For  these  random  in¬ 

2This  is  to  ensure  that  a  Hamiltonian  cycle  always  exists. 


stances,  we  report  performance  of  various  heuristics  in  Ta¬ 
ble  2.  Our  experiments  show  that  BP  boosts  performances 
of  known  TSP  heuristics  in  overall,  where  BP  is  very  easy 
to  code  and  does  not  hurt  the  simplicity  of  heuristics. 

6  CONCLUSION 

The  BP  algorithm  has  been  the  most  popular  algorithm 
for  solving  inference  problems  arising  graphical  models, 
where  its  distributed  implementation,  associated  ease  of 
programming  and  strong  parallelization  potential  are  the 
main  reasons  for  its  growing  popularity.  In  this  paper,  we 
aim  for  designing  BP  algorithms  solving  LPs,  and  pro¬ 
vide  sufficient  conditions  for  its  correctness  and  conver¬ 
gence.  We  believe  that  our  results  provide  new  interesting 
directions  on  designing  efficient  distributed  (and  parallel) 
solvers  for  large-scale  LPs. 
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Abstract — Graphical  Model  (GM)  has  provided  a  popular 
framework  for  big  data  analytics  because  it  often  lends  itself 
to  distributed  and  parallel  processing  by  utilizing  graph-based 
6 local’  structures.  It  models  correlated  random  variables  where 
in  particular,  the  max-product  Belief  Propagation  (BP)  is  the 
most  popular  heuristic  to  compute  the  most-likely  assignment 
in  GMs.  In  the  past  years,  it  has  been  proven  that  BP  can  solve 
a  few  classes  of  combinatorial  optimization  problems  under 
certain  conditions. 

Motivated  by  this,  we  explore  the  prospect  of  using  BP 
to  solve  generic  combinatorial  optimization  problems.  The 
challenge  is  that,  in  practice,  BP  may  converge  very  slowly 
and  even  if  it  does  converge,  the  BP  decision  often  violates 
the  constraints  of  the  original  problem.  This  paper  proposes 
a  generic  framework  that  enables  us  to  apply  BP-based 
algorithms  to  compute  an  approximate  feasible  solution  for 
an  arbitrary  combinatorial  optimization  task.  The  main  novel 
ingredients  include  (a)  careful  initialization  of  BP  messages, 
(b)  hybrid  damping  on  BP  updates,  and  (c)  post-processing 
using  BP  beliefs.  Utilizing  the  framework,  we  develop  parallel 
algorithms  for  several  large-scale  combinatorial  optimization 
problems  including  maximum  weight  matching,  vertex  cover 
and  independent  set.  We  demonstrate  that  our  framework 
delivers  high  approximation  ratio,  speeds  up  the  process  by 
parallelization,  and  allows  large-scale  processing  involving 
billions  of  variables. 

Keywords -Combinatorial  optimization,  Belief  propagation, 
Parallel  algorithm,  Maximum  weighted  matching 

I.  Introduction 

Graphical  Models  (GMs)  provide  a  useful  framework 
for  modeling  and  processing  real-world,  large-scale  data 
applications.  From  traditional  big  data  analytics,  such  as 
page  rank  [1]  and  graph  mining  [2],  to  more  recent  deep 
learning  [3],  graphical  models  have  been  commonly  applied 
for  processing  large-scale  data-sets.  GM  is  a  particularly 
good  fit  for  big  data  applications  because  it  lends  itself 
to  fast  parallel  implementations  by  utilizing  graph-based 
‘local’  structures.  Several  modern  programming  models,  such 
as  GraphLab  [4],  GraphChi  [5]  and  GraphX  [6],  enable 
distributed,  parallel  computation  on  GMs. 

One  of  the  most  common  computational  tasks  found  in 
GM’s  applications  is  to  compute  the  most-likely  assign¬ 
ment  to  random  variables,  so-called  a  MAP  (Maximum- 
A-Posteriori)  estimate.  This  can  be  viewed  as  solving  a  large- 
scale  optimization  problem,  which  is  becoming  increasingly 
important  for  big  data  analytics  since  it  presents  a  major 
computational  bottleneck.  Motivated  by  this,  we  explore 

*The  first  two  authors  contributed  equally  to  this  work. 


the  prospect  of  using  GMs  to  solve  optimization  problems 
at  scale.  In  particular,  we  propose  a  message-passing  based 
algorithm  to  solve  combinatorial  optimization  problems  based 
on  Belief  Propagation  (BP).  The  (max-product)  BP  algorithm 
is  a  well-studied  heuristic  that  has  been  popularly  used  to 
approximately  solve  MAP  optimization  tasks.  It  is  an  iterative, 
message-passing  algorithm  proven  to  produce  exact  solutions 
for  tree  structured  GMs.  However,  understanding  on  the 
performance  of  BP  for  loopy  GMs  has  been  quite  limited. 

Our  goal  is  to  design  a  highly  accurate  approximation 
algorithm  based  on  BP  that  solves  generic  large-scale 
combinatorial  optimization  problems.  The  benefit  of  such 
an  algorithm  is  that  it  inherently  lends  itself  to  parallel 
implementations,  enabling  fast  performance  and  ensuring 
scalability.  However,  the  challenge  is  that  the  BP  algorithm 
is  not  guaranteed  to  be  correct  or  even  converge  in  general. 
Even  if  it  converges  to  the  correct  solution,  its  convergence 
speed  is  too  low  for  solving  large-scale  instances.  Especially, 
when  there  are  multiple  optima  (multiple  solutions),  BP  is 
known  to  oscillate  in  many  cases  [7,  8,  9].  One  can  stop 
BP  iterations  without  waiting  for  convergence,  but  then  BP 
algorithms  often  produce  infeasible  solutions,  i.e.,  violate  the 
constraints  of  the  targeted  combinatorial  optimization. 

Contribution.  We  resolve  the  issues  by  designing  a  generic 
BP-based  framework  that  computes  highly  accurate  and 
feasible  approximation  solutions.  The  basic  idea  is  to  use  a 
truncated  BP  algorithm  in  conjunction  with  existing  heuristics 
to  enforce  a  feasible  solution  in  a  way  that  ensures  high 
approximation  ratio.  At  a  high  level,  the  algorithm  takes 
the  following  steps.  First,  given  an  optimization  problem, 
we  represent  the  problem  in  the  MAP  framework  of  GM. 
Second,  we  run  the  corresponding  BP’s  message-passing  and 
‘partially’  solve  the  optimization  problem.  However,  because 
BP  often  does  not  converge  quickly,  we  run  only  a  fixed 
number  of  BP  iterations;  we  do  not  wait  for  convergence. 
Instead,  to  boost  up  the  quality  of  BP  decisions,  we  (a) 
carefully  initialize  the  BP  messages,  (b)  add  a  small  noise 
to  weights,  and  (c)  apply  a  hybrid  damping  strategy  on 
BP  message  updates  (Section  III-B).  Finally,  we  apply 
existing  heuristics  as  post-processing  procedures  to  enforce 
the  feasibility  of  the  BP  decision.  In  particular,  we  run  known 
heuristics  for  a  given  combinatorial  optimization  problem  by 
replacing  the  parameters  of  the  original  problems  (e.g.,  edge 
weights)  with  BP  beliefs.  This  ensures  that  the  framework 
is  applicable  to  any  combinatorial  optimization  problems, 
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while  achieving  a  higher  approximation  ratio  than  existing 
heuristics. 

In  summary,  this  paper  makes  three  key  contributions: 

1)  Practical  BP-based  algorithm  design:  To  the  best 
of  our  knowledge,  this  paper  is  the  first  to  propose  a 
generic  concept  for  designing  BP-based  algorithms  that 
solve  large-scale  combinatorial  optimization  problems. 

2)  Parallel  implementation:  We  also  demonstrate  that 
the  algorithm  is  easily  parallelizable.  For  the  maximum 
weighted  matching  problem,  this  translates  to  7 lx  speed 
up  while  sacrificing  only  0.1%  accuracy  compared  to 
the  state-of-art  exact  algorithm  [10]. 

3)  Extensive  empirical  evaluation:  We  evaluate  our  al¬ 
gorithms  on  three  different  combinatorial  optimization 
problems  on  diverse  synthetic  and  real-world  data-sets. 
Our  evaluation  shows  that  the  framework  shows  higher 
accuracy  compared  to  other  known  heuristics. 

Related  Work.  In  the  past  years,  the  convergence  and 
correctness  of  BP  has  been  studied  analytically  for  several 
classical  combinatorial  optimization  problems,  including 
matchings  [7,  11,  12],  perfect  matchings  [13],  shortest  paths 
[8],  independent  sets  [14],  network  flows  [9]  and  vertex 
covers  [15].  The  important  common  feature  of  these  models 
is  that  BP  converges  to  a  correct  assignment  when  the 
linear  programming  (LP)  relaxation  of  the  combinatorial 
optimization  is  tight,  i.e.,  when  it  shows  no  integrality 
gap.  However,  LP  tightness  is  an  inevitable  condition  to 
guarantee  the  convergence  of  BP  to  the  optimal  solution, 
which  is  the  main  limitation  of  these  theoretical  studies 
towards  wider  applicability.  Moreover,  even  if  BP  converges 
to  the  optimal  solution,  its  convergence  speed  is  often  too 
slow  for  solving  large-scale  instances.  There  have  been 
also  empirical  studies  of  BP-based  algorithms  for  specific 
combinatorial  optimization  instances,  including  traveling 
salesman  [16],  graph  partitioning  [16],  Steiner  tree  [17]  and 
network  alignment  [18].  However,  their  focuses  are  not  on 
large-scale  instances  and  the  running  times  of  the  proposed 
algorithms  typically  grow  super-linearly  with  respect  to  the 
input  size.  In  contrast,  we  provide  a  generic  framework 
on  designing  BP-based  scalable,  parallel  algorithms  that 
are  widely  applicable  to  arbitrary  large-scale  combinatorial 
optimization  problems. 

Organization.  We  provide  backgrounds  on  BP  and  com¬ 
binatorial  optimization  problems  in  Section  II.  Section  III 
describes  our  BP-based  algorithm  design,  and  Section  IV 
provides  details  on  its  parallel  implementation.  We  evaluate 
our  algorithm  on  several  combinatorial  optimization  problems 
in  Section  V.  Finally,  we  conclude  in  Section  VI. 


factorizes  as  follows:  for  z  =  [zi\  E  {0,  l}n, 

Pr [Z  =  z\  OC  JJ  Ipi(Zi)  11  1pa(za), 
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where  {^,  are  non-negative  functions,  so-called  factors; 
F  is  a  collection  of  subsets 


F  =  {aua2,...,ak}  C  2{1’2’  ”’n} 

(each  otj  is  a  subset  of  {1,  2, . . . ,  n}  with  Kl  >  2); 
is  the  projection  of  z  onto  dimensions  included  in  a}  In 
particular,  ^  is  called  a  variable  factor.  Assignment  z*  is 
called  a  maximum-a-posteriori  (MAP)  assignment  if  z*  — 
argmax2G{0,i}n  Pr[z].  This  means  that  computing  a  MAP 
assignment  requires  us  to  compare  Pr[z]  for  all  possible  z, 
which  is  typically  computationally  intractable  (i.e.,  NP-hard) 
unless  the  induced  bipartite  graph  of  factors  F  and  variables 
z,  the  so-called  factor  graph,  has  a  bounded  tree  width  [19]. 

The  max-product  belief  propagation  (BP)  is  a  popular 
heuristic  for  approximating  the  MAP  assignment  in  GM.  BP 
is  implemented  iteratively;  at  each  iteration  t,  BP  maintains 
four  messages  {ra^_^(c) , m^a(c)  :  c  E  {0,1}}  between 
every  variable  Zi  and  every  associated  a  E  Fi,  where  Fi  := 
{a  E  F  :  i  E  a}.  The  messages  are  updated  as  follows: 
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One  can  reduce  the  complexity  of  messages  by  combining 
(1)  and  (2)  as: 

m¥?a(c)  =  Mc)  fi  max  ipa>(za>)  TT  mt^a,{zj). 

a'eFi\a  jEa'\i 

Given  a  set  of  messages  |m^a(c) ,  raa_^(c)  :  c  E  {0, 1}}, 
the  so-called  BP  marginal  beliefs  are  computed  as  follows: 


bi[zi]  ='ipi(zi)YlaeFima^i(zi).  (3) 

This  BP  algorithm  outputs  zBP  =  [ zfp ]  where 


„ BP 


1  if&i[l]>M0] 

<  ?  if  bi[l }  =  bi[ 0] 
0  if  bi[l]  <  bi[0] 


It  is  known  that  zBP  converges  to  a  MAP  assignment  after 
a  sufficient  number  of  iterations,  if  the  factor  graph  is  a  tree 
and  the  MAP  assignment  is  unique.  However,  if  the  graph 
contains  cycles  or  MAP  is  not  unique,  the  BP  algorithm  is 
not  guaranteed  to  converge  to  a  MAP  assignment  in  general. 


II.  Preliminaries 

A.  Graphical  Model  and  Belief  Propagation 

A  joint  distribution  of  n  (binary)  random  variables  Z  = 
[Zi]  E  {0,  l}n  is  called  a  Graphical  Model  (GM)  if  it 


B.  Belief  Propagation  for  Combinatorial  Optimization 

The  max-product  BP  can  be  applied  to  compute  an 
approximate  solution  for  any  ‘discrete’  optimizations.  This 

^or  example,  if  z  =  [0,1,  0]  and  a  =  {1,  3},  then  za  =  [0,  0]. 
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section  describes  the  maximum  weight  matching  problem  as 
an  example.  Given  a  graph  G  =  (V,E)  and  edge  weights 
w  =  [we\  G  it  finds  a  set  of  edges  such  that  each 

vertex  is  connected  to  at  most  one  edge  in  the  set  and  the 
sum  of  edge  weights  in  the  set  is  maximized.  The  problem 
is  formulated  as  the  following  IP  (Integer  Programming): 

maximize  w  •  x 

s.t.  xe  <  1,  Vn  e  V,  X  =  [xe\  e  {0, 1}|S|,  (4) 

e£d(v) 

where  S(v)  is  the  set  of  edges  incident  to  vertex  v  G  V. 
Now  consider  the  following  GM:  for  x  =  [xe]  G  {0, 1}^, 


where 


Pr[X  =  x]  oc  JJ  eWeXe  JJ  i/>v(xs(v)), 
eeE  vev 


'Ipv  {%8(v) ) 


1  if  EeeSlv)  Xe  <  1 

0  otherwise 


(5) 


One  can  easily  observe  that  the  MAP  assignments  for  GM  (5) 
correspond  to  the  (optimal)  solution  of  IP  (4).  Therefore,  one 
can  use  the  max-product  BP  for  solving  the  maximum  weight 
matching  problem,  where  the  algorithm  can  be  simplified 
using  di^j  =  log  as  described  in  Algorithm  1. 


Algorithm  1  BP  for  Maximum  Weight  Matching 
(ITERATION)  Calculate  new  messages  as  follows: 


t+i 


max 
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t 
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)} 


(DECISION)  For  each  edge  e  —  (i,j)  G  E,  decide 


{1  if  ( di^j  +  CLj^i)  <  We 
?  if  (ai^j  +  CLj^n)  -  We 
0  if  ( CL% — yj  CLj — >-i)  We 


Note  that  one  can  design  similar  BP  algorithms  for  arbitrary 
combinatorial  optimization  problems  (e.g.,  minimum  weight 
vertex  cover  and  maximum  weight  independent  set)  as  we 
demonstrate  in  Section  V. 

III.  Algorithm  Design 

The  main  goal  of  this  paper  is  to  design  BP-based  parallel 
algorithms  for  solving  combinatorial  optimizations.  To  this 
end,  one  can  design  a  BP  algorithm  as  described  in  Section 
II-B.  However  (a)  it  might  diverge  or  converge  very  slowly, 
(b)  even  if  it  converges  quickly,  the  BP  decision  might  be  not 
correct,  and  (c)  even  worse,  BP  might  produce  an  infeasible 
solution,  i.e.,  it  does  not  satisfy  the  constraints  of  the  problem. 

To  address  these  issues,  we  propose  a  generic  BP- 
based  framework  that  provides  highly  accurate  approximate 
solutions  for  combinatorial  optimization  problems.  The 
framework  has  two  steps,  as  shown  in  Figure  1.  In  the 
first  phase,  it  runs  a  BP  algorithm  for  a  fixed  number  of 
iterations  without  waiting  for  convergence.  Then,  the  second 


phase  runs  a  known  heuristic  using  BP  beliefs  instead  of  the 
original  weights  to  output  a  feasible  solution.  Namely,  the 
first  and  second  phases  are  respectively  designed  for  ‘BP 
weight  transforming’  and  ‘post-processing’.  In  the  following 
sections,  we  describe  the  two  phases  in  more  detail  in  reverse 
order.  For  illustration  purposes,  we  focus  on  the  maximum 
weight  matching  problem  while  the  results  are  derived  using 
Erdos-Renyi  random  graphs  with  1000  vertices,  average 
degree  100,  and  edge  weights  drawn  from  the  uniform 
distribution  over  the  interval  [0,1].  Later  in  Section  V, 
we  demonstrate  the  framework  is  applicable  to  any  other 
combinatorial  optimization  problems. 

A.  Post-Processing  Phase. 

The  decision  on  BP  beliefs  might  give  an  infeasible 
solution.  For  example,  in  the  maximum  weight  matching 
problem,  BP  decides  xe  =  0, 1  based  on  the  sign  of 
Wij  —  +  CLj^i)  (see  Algorithm  1),  but  the  edges  of 

xe  =  1  might  not  form  a  matching  even  if  BP  converges, 

i.e.,  there  exists  a  vertex  v  such  that  J2eeS(v)  xe  >  1- 

To  resolve  the  issue,  we  use  post-processing  by  utilizing 
existing  heuristics  to  the  given  problem  that  find  a  feasible 
solution.  Applying  post-processing  ensures  that  the  solution 
is  at  least  feasible.  In  addition,  our  key  idea  is  to  replace  the 
original  weights  by  the  logarithm  of  BP  beliefs,  i.e.,  the  new 
weight  on  edge  e  becomes:  Wij  —  (a^j  +  aj^i).  After  this, 
we  apply  known  heuristics  using  the  logarithm  of  BP  beliefs 
to  achieve  higher  accuracy. 

For  example,  the  following  ‘local’  greedy  algorithm  can 
be  used  as  a  post-processing  mechanism: 

1.  Initially,  all  vertices  are  ‘unmatched’,  i.e.,  xe  =  0  for 
all  eeE. 

2.  Choose  an  arbitrary  unmatched  vertex  i  and  match  it 

to  an  unmatched  vertex  j  ^  i  having  the  highest  value 
in  i.e.,  set  Xij  =  1. 

3.  Keep  iterating  step  2  until  no  more  vertex  can  be 
matched. 

To  confirm  the  effectiveness  of  the  proposed  post-processing 
mechanism,  we  compare  it  with  the  following  two  alternative 
post-processing  schemes  that  remove  edges  to  enforce 
matching  after  BP  processing  in  a  naive  manner: 

•  Random:  If  there  exists  a  vertex  v  such  that 
^2ees^v)  xe  >  1  on  the  BP  decision,  randomly  select 
one  edge  and  remove  other  edges. 

•  Weight:  If  there  exists  a  vertex  v  such  that  X]ee<50)  xe  > 
1  on  the  BP  decision,  remove  edges  of  smaller  weight. 

Figure  3(a)  compares  the  approximation  ratio  obtained 
using  BP-belief-based  post-processing  versus  the  naive  post¬ 
processing  heuristics  (random  and  weight).  It  shows  that 
the  proposed  BP-belief-based  post-processing  outperforms 
the  rest.  Note,  the  results  in  Figure  3  were  obtained  by 
first  applying  BP  message  passing  for  weight  transformation. 
Next,  we  explain  how  this  is  done  in  our  framework. 
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■  20  iterations  ■  100  iterations  □  5000  iterations 


Figure  1 :  Overview  of  our  generic  BP-based  framework 
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Figure  2:  Effects  of  initial  messages  on  the 
number  of  BP  iterations.  We  set 
a?_^.  =  —  c  •  Wij  for  a  value  c  of  x-axis. 
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Figure  3:  (a)  Average  approximation  ratio  for  different  post-processing  schemes.  We  use  a  local  greedy  algorithm  as  a  post-processing  based  on  original 
weights  and  BP  messages  (i.e.,  beliefs).  The  ‘Random  selection’  post-processing  is  also  compared,  (b)  Effects  of  initial  messages  on  the  convergence  of  BP. 
We  set  a?  .  =  =  c  •  for  the  value  c  of  x-axis  (c)  Approximation  ratio  for  different  initial  messages  =  0,  /2,  Wij 


B.  BP  Weight  Transforming  Phase 

To  improve  the  approximation  quality  and  solve  the 
convergence  issues,  we  use  three  modifications  to  the  standard 
BP  algorithm:  (1)  careful  initialization  on  messages,  (2)  noise 
addition  and  (3)  hybrid  damping  on  message  updates. 

Message  Initialization.  The  standard  message  initialization 
is  =  1,  i.e.,  =  0  for  the  maximum 

weight  matching  problem  (see  Algorithm  1).  The  convergence 
rate  of  BP  depends  on  the  initialized  messages.  As  reported 
in  Figure  3(b),  we  try  different  initializations,  =  c  • 
for  0  <  c  <  1,  where  the  case  c  =  0.5  shows  the  fastest 
convergence.  The  main  intuition  we  found  for  explaining 
such  a  phenomenon  is  as  follows.  If  ■  =  w^/ 2,  the  BP 
decision  at  the  initial  step  is  neutral,  i.e.,  xe  =?,  since  a£_^.  + 
CLj^i  =  On  the  other  hand,  if  a°_^.  =  0,  BP  chooses 
xe  =  1  initially  for  all  edges  and  most  likely  does  xe  =  0  for 
most  edges  in  the  next  step,  i.e.,  it  keeps  oscillating  between 
xe  =  1  and  xe  =  0  for  a  while.  The  choice  a°  •  =  Wij/2 
alleviates  the  fluctuation  behavior  of  BP  and  boosts  up  its 
convergence  speed. 

We  remind  that,  under  our  framework,  BP  runs  only 
for  a  fixed  number  of  iterations  since  it  might  converge 
too  slowly,  even  with  the  initialization  a£_^.  =  w^/2 ,  for 
practical  purposes.  With  fixed  number  of  iterations,  careful 
initialization  becomes  even  more  critical  as  experimental 
results  in  Figure  3(c)  and  Figure  2  suggest.  For  example,  if 
one  runs  5000  iterations  of  BP,  they  show  that  the  standard 
initialization  (a°_^.  =  0)  achieves  at  most  30%  approximation 
ratio,  while  the  proposed  method  (a°  .  =  Wij/2)  achieves 
99%.  Moreover,  one  can  also  observe  that  the  advantage 
of  more  BP  updates  diminishes  as  the  number  of  iterations 


#  vertices 
(#  edges) 

Approximation  Ratio 

Difference 

Multiple  optima 

Unique  optimum 

lk  (50k) 

99.88  % 

99.90  % 

-0.02  % 

5k  (250k) 

99.86  % 

99.85  % 

+0.01  % 

10k  (500k) 

99.85  % 

99.84  % 

+0.01  % 

20k  (1M) 

99.84  % 

99.83  % 

+0.01  % 

Table  I:  Approximation  ratio  of  BP  for  MWM  with  multiple  optima  and 
a  unique  optimum.  We  introduce  a  small  noise  to  the  edge  weights  and  set 
the  initial  message  by  a /2. 

becomes  large.  The  observation  holds  for  much  larger  graphs. 
Thus,  we  only  run  100  BP  iterations  in  our  algorithm  and 
do  not  wait  for  BP’s  convergence. 

Noise  Addition.  The  BP  algorithm  often  oscillates  when 
the  MAP  solution  is  not  unique.  To  address  this  issue,  we 
transform  the  original  problem  to  one  that  has  a  unique 
solution  with  high  probability  by  adding  small  noises  to  the 
weights,  i.e.,  we  <—  we  +  re,  where  re  G  [— r,  r]  is  a  random 
number  chosen  independently  across  edges.  We  apply  this  to 
all  cases.  Here,  one  has  to  be  careful  in  deciding  the  range 
r  of  noises.  If  re  is  too  large,  the  quality  of  BP  solution 
deteriorates  because  the  optimal  solution  might  have  changed 
from  the  original  problem.  On  the  other  hand,  if  re  is  too 
small  compared  to  we,  BP  converges  very  slowly.  To  achieve 
a  balance,  we  choose  the  range  r  of  noise  re  as  10%  of  the 
minimum  distance  among  weights.  We  find  that  this  results 
in  over  99.8%  approximation  ratio  even  when  the  solution 
is  not  unique,  which  has  little  difference  with  that  of  unique 
solution  as  shown  in  Table  I. 

Hybrid  Damping.  To  boost  up  the  convergence  speed  of 
BP  updates  further,  we  use  a  specific  damping  strategy  to 
alleviate  message  oscillation.  We  update  messages  to  be  the 
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#  vertices 
(#  edges) 

Approximation  Ratio 

no-damp(lOO) 

damp(100) 

no-damp(50) 

+damp(50) 

damp(50) 

+no-damp(50) 

10k  (500k) 

99.58  % 

99.69  % 

99.83  % 

99.56  % 

20k  (1M) 

99.55  % 

99.68  % 

99.82  % 

99.56  % 

50k  (2.5M) 

99.56  % 

99.69  % 

99.83  % 

99.57  % 

100k  (5M) 

99.56  % 

99.69  % 

99.83  % 

99.57  % 

Table  II:  Approximation  ratio  of  BP  without  damping,  BP  with  damping, 
BP  with  damping  only  for  first  50  iterations,  and  BP  with  damping  for  last 
50  iterations.  We  introduce  a  small  noise  to  the  edge  weights  and  set  the 
initial  message  by  a?_^.  =  =  Wij/2. 

average  of  old  and  new  messages. 

,  max  { max  {wik  -  0}  } 

We  note  that  the  damping  strategy  provides  a  similar  effect  as 
our  proposed  initialization  =  Wij/2.  Hence,  if  one  uses 
both,  the  effect  of  one  might  be  degraded  due  to  the  other.  Due 
to  this,  we  first  run  the  half  of  BP  iterations  without  damping 
(this  is  for  keeping  the  effect  of  the  proposed  initialization) 
and  perform  the  last  half  of  BP  iterations  with  damping. 
As  reported  in  Table  II,  this  hybrid  approach  outperforms 
other  alternatives,  including  (a)  no  use  of  damping,  (b)  using 
damping  in  every  iteration,  and  (c)  damping  in  the  first  half 
of  BP  iterations  and  no-damping  in  the  last  half. 

IV.  Parallel  Design  and  Implementation 

This  section  addresses  issues  in  parallelization  of  our 
algorithm.  First,  we  introduce  asynchronous  message  up¬ 
date  that  enables  efficient  parallelization  of  BP  message 
passing.  Second,  we  illustrate  the  issues  in  parallelizing  post¬ 
processing.  Finally,  we  describe  the  parallel  implementations 
of  our  algorithm  and  their  benefits. 

A.  Asynchronous  Message  Update 

So  far,  we  have  assumed  that  there  is  only  one  thread,  and 
BP  messages  are  updated  synchronously  among  vertices  after 
calculating  new  message  values.  Thus,  each  iteration  consists 
of  two  phases:  message  calculation  phase  and  message  update 
phase. 

For  parallelization,  we  first  divide  the  graph  by  partitioning 
the  vertices,  and  assign  each  partition  to  a  single  thread  (see 
Section  IV-C  for  details).  However,  if  we  naively  parallelize 
the  process  using  multiple  threads,  frequent  synchronization 
may  incur  large  overhead.  Thus,  we  apply  asynchronous 
message  update  where  each  vertex  updates  the  message  value 
right  after  new  message  value  is  calculated  and  eliminate 
synchronization  point  between  iterations.  This  makes  the 
process  faster  because  of  the  reduced  synchronization  points. 
Figure  4  shows  that  performance  improvement  (speed  up  in 
running  time)  of  asynchronous  update  over  synchronous 
is  up  to  237%  in  our  example  graph  for  the  maximum 
weight  matching  problem  with  16  threads2;  we  leave  detailed 

2We  use  a  machine  with  two  Intel  Xeon(R)  CPU  E5-2690  @  2.90GHz 
each  with  8  cores. 


evaluation  in  Section  V. 

We  now  discuss  its  impact  on  approximation  quality. 
Two  factors  marginally  affect  the  approximation  quality  in 
opposite  directions.  First,  updating  the  message  value  of  each 
vertex  right  after  its  message  calculation  marginally  improves 
the  approximation  ratio,  as  shown  in  Figure  5.  Updating  a 
message  right  after  its  calculation  on  a  individual  vertex 
basis  implicitly  has  a  similar  effect  to  applying  an  additional 
iteration,  which  improves  the  quality.  Second,  having  multiple 
threads  run  without  synchronizing  across  iterations  marginally 
degrade  the  approximation  quality.  The  reason  is  that  some 
threads  run  faster  than  others,  and  messages  from  the  slower 
threads  are  not  updated  as  frequently.  We  quantify  this  effect 
in  Figure  5  and  find  that  the  impact  is  marginal;  we  still 
achieve  99.9%  accuracy  with  multiple  threads. 

In  summary,  using  asynchronous  message  updates,  we 
speed  up  the  run-time  of  the  algorithm  by  up  to  240%, 
while  achieving  99.9%  approximation  ratio.  In  Section  V, 
we  show  that  the  results  also  extend  to  other  combinatorial 
optimization  problems. 

B.  Local  Post-Processing 

The  second  phase  of  our  algorithm  runs  existing  heuristics 
for  post-processing  to  enforce  the  feasibility  of  BP  decisions. 
While  the  framework  works  with  any  heuristics-based  post¬ 
processing  methods,  for  the  entire  process  to  be  parallel,  it 
is  important  that  the  post-processing  step  is  also  parallel.  An 
important  criterion  for  efficient  parallelization  is  locality  of 
computation;  if  the  post-processing  heuristics  can  compute 
the  result  locally  without  requiring  global  knowledge,  they 
can  be  easily  parallelized.  Moreover,  if  they  do  not  require 
synchronization,  the  running  time  can  be  further  reduced. 

Fortunately,  for  most  combinatorial  optimization  problems 
heuristics  that  match  the  two  criteria  exist.  A  local  greedy 
algorithm,  for  example,  enables  local  post-processing  (i.e., 
it  does  not  require  global  information),  and  require  little 
synchronization.  As  reported  in  Section  V,  we  also  evaluate 
our  algorithm  in  conjunction  with  other  post-processing 
heuristics  to  demonstrate  its  flexibility. 

C.  Parallel  Implementation 

The  BP  algorithm  is  easy  to  parallelize  because  of  its 
message  passing  nature.  To  demonstrate  this,  we  parallelize 
our  BP-based  framework  using  three  platforms: 

•  GraphChi  enables  large-scale  graph  computation  on  a 
personal  computer  by  utilizing  disk  drive  [5].  Using 
its  API,  we  specify  the  BP  message  update  rules  to 
enable  parallel  message  updates  and  scale  the  size  of 
the  problem  up  to  billions  of  variables. 

•  OpenMP  is  a  task-based  parallelization  library  developed 
by  Intel.  Simply  putting  OpenMP  pragma  directive 
enables  the  compiler  to  support  parallel  computation. 

•  For  our  pthread- based  implementation,  we  divide  a 
single  BP  iteration  into  the  smaller  execution  blocks 
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Figure  4:  Average  running  time  of  our  BP-based 
algorithm  with  synchronous  message  update  and 
asynchronous  message  update.  We  apply  all  three 
modifications  to  BP  of  Section  III-B. 
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Figure  5 :  Approximation  ratio  of  our  BP-based 
algorithm  with  synchronous  and  asynchronous  message 
update  with  16  threads.  We  apply  all  three 
modifications  to  BP  of  Section  III-B. 


#  Vertices  #  Edges 

apache 1 

80k 

230k 

apache2 

715k 

2M 

ecology2 

1M 

2M 

G3_circuit 

1.6M 

3M 

boneOlO 

1M 

23.4M 

Table  III:  Summary  of  MWM 
data-sets 


#  Vertices 

Optimal  Cost 

MVC 

MIS 

frb-30-15 

450 

420 

30 

frb-45-21 

945 

900 

45 

frb-53-24 

1,272 

1,219 

53 

frb-59-26 

1,534 

1,475 

59 

Table  IV:  Summary  of  MWVC 
and  MWIS  data-sets 


called  tasks.  Each  iteration  is  divided  into  per- thread 
tasks.  Because  we  have  fixed  multiple  number  of  BP 
iterations,  it  concerns  many  more  tasks.  We  put  these 
tasks  in  a  task  queue.  Initially  all  threads  are  assigned 
a  task  and  the  thread  finished  its  task  will  be  assigned 
the  next  task  in  the  task  queue.  This  minimizes  the 
overlap  between  different  iterations  and  synchronization 
points,  which  reduces  the  run  time  while  obtaining  high 
approximation  ratio. 

Algorithm  2  BP  for  Minimum  Weight  Vertex  Cover 

(ITERATION)  Calculate  new  messages  as  follows: 


aV^j  min  <  ^  ajUi,  0 

[  keS(i)\{j} 

(DECISION)  For  each  vertex  i  G  V,  decide 


/ 

1 

if 

< 

-Wi 

Xi  =  < 

? 

if 

J2jeS(i)  a3^i 

—  Wi 

0 

if 

T,j£5(i)  ai^i 

> 

-Wi 

Algorithm  3  BP  for  Maximum  Weight  Independent  Set 
(ITERATION)  Calculate  new  messages  as  follows: 


a^j  <-  max  <  wv  ^  ajL^,  0 
[  kE5(i)\{j} 

(DECISION)  For  each  vertex  i  G  V,  decide 


1 

if 

<  Wi 

Xi  —  < 

? 

if 

ao^i 

—  Wi 

0 

if 

>  Wi 

V.  Evaluation 

We  evaluate  our  BP  framework  using  three  popular  combi¬ 
natorial  optimization  problems:  maximum  weight  matching, 
minimum  weight  vertex  cover  and  maximum  weight  indepen¬ 
dent  set  problem.  We  perform  extensive  empirical  evaluation 
to  demonstrate  the  benefit  of  our  algorithm  for  the  following 
three  questions: 

1)  Does  the  BP-based  algorithm  provide  high  approxima¬ 
tion  ratio ? 


2)  Can  the  algorithm  achieve  speed-up  due  to  parallel 
implementations  ? 

3)  Can  it  solve  large-scale  problems  involving  billions  of 
variables? 

We  already  introduced  the  IP  formulation  of  the  maximum 
weight  matching  (MWM)  in  (4),  where  those  of  the  mini¬ 
mum  weight  vertex  cover  (MWVC)  and  maximum  weight 
independent  set  problem  (MWIS)  are  as  follows: 

MWVC:  minimize  w  •  x 

subject  to  xu  +  xv  >  1,  Ve  =  ( u ,  v)  E  E 

x  =  [xv\  e  {o,  i}|y|  (6) 

MWIS:  maximize  w  •  x 

subject  to  xu  +  xv  <  1,  Ve  =  (u,  v)  E  E 

x=[xe]€{0,l}W  (7) 

We  provide  descriptions  of  BP  algorithms  in  Algorithm  2-3 
for  MWVC  and  MWIS.  As  we  propose  in  Section  III-B,  we 
choose  initial  BP  messages  for  neutral  decisions:  = 

—Wij/\S(i)\  for  vertex  cover  and  =  Wij/\5(i)\  for 

independent  set. 

A.  Experiment  Setup 

In  our  experiments,  both  real-world  and  synthetic  data¬ 
sets  are  used  for  evaluation.  For  MWM,  we  used  data-sets 
from  the  university  of  Florida  sparse  matrix  collection  [20] 
summarized  in  Table  III.  For  larger  scale  synthetic  evaluation, 
we  generate  Erdos-Renyi  random  graphs  (up  to  50  million 
vertices  with  2.5  billion  edges)  with  average  vertex  degree  of 
100  with  edge  weights  drawn  independently  from  the  uniform 
random  distribution  over  the  interval  [0,1].  For  MWVC 
and  MWIS,  we  use  the  frb-series  from  BHOSLIB  [21] 
summarized  in  Table  IV,  where  it  also  contains  the  optimal 
solutions.  We  note  that  we  perform  no  experiment  using 
synthetic  data-sets  for  MWVC  and  MWIS  since  they  are 
NP-hard  problems,  i.e.,  impossible  to  compute  the  optimal 
solutions.  On  the  other  hand,  for  MWM  the  Edmonds’ 
Blossom  algorithm  [10]  can  compute  the  optimal  solution 
in  polynomial  time.  All  experiments  in  this  section  are 
conducted  on  a  machine  with  Intel  Xeon(R)  CPU  E5-2690 
@  2.90GHz  with  8  cores  and  8  hyperthreads  with  128GB  of 
memory,  unless  otherwise  noted. 
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Approximation  Ratio 

Serial  BP 

Parallel  BP 

(1  thread) 

(16  threads) 

500k  (25M) 

99.93  % 

99.90  % 

syntnetic 

$  vertices 

1M  (50M) 

99.93  % 

99.90  % 

(it  pHqpc) 

2M  (100M) 

99.94  % 

99.91  % 

\Tr  CUgCa ) 

5M  (250M) 

99.93  % 

99.90  % 

apache 1 

100.0  % 

100.0  % 

Real-world 

apache2 

100.0  % 

100.0  % 

Data-set 

ecology2 

100.0  % 

100.0  % 

(Florida) 

G3_circuit 

99.95  % 

99.95  % 

boneOlO 

99.11  % 

99.12  % 

Table  V :  MWM:  Approximation  ratio  of  our  BP-based  algorithm  on 
synthetic  and  sparse  matrix  collection  data-sets  [20]. 
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Figure  6:  MW  VC:  Average  approximation  ratio  of  our  BP-based 
algorithm,  the  2-approximation  algorithm  and  the  greedy  algorithm  on 
frb-series  data-sets. 


B.  Approximation  Ratio 

We  now  demonstrate  our  BP-based  approximation  algo¬ 
rithm  produces  highly  accurate  results.  In  particular,  we 
show  that  our  BP-based  algorithms  outperform  well-known 
heuristics  for  MWVC,  MWIS  and  closely  approximate  exact 
solutions  for  MWM  for  all  cases  we  evaluate. 


Maximum  Weight  Matching.  For  MWM,  we  compare  the 
approximation  qualities  of  serial,  synchronous  BP  (i.e.,  one 
thread)  in  Section  III  and  parallel,  asynchronous  imple¬ 
mentation  (i.e.,  using  multiple  threads)  in  Section  IV  on 
both  synthetic  and  real-world  data-sets,  where  we  compute 
the  optimal  solution  using  the  Blossom  algorithm  [10]  to 
measure  the  approximation  ratios.  Table  V  summarize  our 
experimental  results  for  MWM  for  the  synthetic  data-sets 
and  the  Florida  data.  Our  BP-based  algorithm  achieves  99% 
to  99.9%  approximation  ratios. 

Minimum  Weight  Vertex  Cover.  For  MWVC,  we  use  two 
post-processing  procedures:  greedy  and  2- approximation 
algorithm  [22].  For  the  local  greedy  algorithm,  we  choose 
a  random  edge  and  add  one  of  its  adjacent  vertices  with  a 
smaller  weight  until  all  edges  are  covered.  We  compare  the 
approximation  qualities  of  our  BP-based  algorithm  compared 
to  the  cases  when  one  uses  only  the  greedy  algorithm  and  the 
2-approximation  algorithm.  Figure  6  shows  the  experimental 
results  for  the  two  post-processing  heuristics.  The  results 
show  that  our  BP-based  weight  transformation  enhances  the 
approximation  quality  of  known  approximation  heuristics  by 
up  to  43%. 
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Figure  7:  MWIS:  Average  approximation  ratio  of  our  BP-based 
algorithm  and  the  greedy  algorithm  on  frb-series  data-sets. 

Maximum  Weight  Independent  Set.  For  MWIS,  the  exper¬ 
iment  was  performed  on  frb-series  data-sets.  We  use  a  greedy 
algorithm  as  the  post-processing  procedure,  which  selects 
vertices  in  the  order  of  higher  weights  until  no  vertex  can 
be  selected  without  violating  the  independent  set  constraint. 
We  compare  the  approximation  qualities  of  our  BP-based 
algorithm  and  the  standard  greedy  algorithm.  Figure  7  shows 
that  our  BP-based  framework  enhances  the  approximation 
ratio  of  the  solution  by  2%  to  23%. 

C.  Parallelization  Speed-up 

One  of  the  important  advantages  of  our  BP-based  algorithm 
is  that  it  is  fast,  while  delivering  high  approximation 
guarantees.  In  this  section,  we  focus  on  the  speed-up  due 
to  parallelization.  Figure  8  compares  the  running  time  of 
the  Blossom  algorithm  and  our  BP-based  algorithm  with 
1  single  core  and  16  cores.  With  five  million  vertices, 
our  asynchronous  parallel  implementation  is  eight  times 
faster  than  the  synchronous  serial  implementation,  while 
still  retaining  99.9%  approximation  ratio  as  reported  in 
Table  V.  To  demonstrate  the  overall  benefit  in  context,  we 
compare  its  running  time  with  that  of  the  current  fastest 
implementation  of  the  Blossom  algorithm  due  to  Kolmogorov 
[10].  Here,  we  note  that  the  Blossom  algorithm  is  inherently 
not  easy  to  parallelize.  For  parallel  implementation,  we  report 
results  for  our  pthread  implementation  ,  but  the  OpenMP 
implementation  also  show  comparable  performance.  For 
20  million  vertices  (one  billion  edges),  it  shows  that  the 
running  time  of  our  algorithm  can  be  accelerated  by  up 
to  71  times  than  the  Blossom  algorithm,  while  sacrificing 
0.1%  of  accuracy.  The  running  time  gap  is  expected  be 
more  significant  for  larger  graphs  since  the  running  times 
of  our  algorithm  and  the  Blossom  algorithm  are  linear  and 
cubic  with  respect  to  the  number  of  vertices,  respectively. 
We  also  experiments  our  algorithms  for  other  problems  to 
demonstrate  the  parallelization  speedup.  Due  to  the  space 
limitation,  we  only  report  that  for  the  minimum  weight  vertex 
cover  problem  in  Figure  9. 

D.  Large-scale  Optimization 

Our  algorithm  can  also  handle  large-scale  instances  be¬ 
cause  it  is  based  on  GMs  that  inherently  lend  itself  to  parallel 
and  distributed  implementations.  To  demonstrate  this,  we 
create  a  large-scale  instance  containing  up  to  50  million 
vertices  and  2.5  billion  edges.  We  experiment  our  algorithm 
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Figure  8:  MWM:  Running  time  of  Blossom 
algorithm  and  our  BP-based  algorithms. 


Figure  9:  MWVC:  Running  time  of  our  Figure  10:  MWM  and  MWVC:  Running  time 
parallel  BP-based  algorithm  (pthread  and  memory  usage  of  GraphChi-based 

implementation)  on  large-scale  graphs.  implementation  on  large-scale  graphs. 


using  GraphChi  on  a  single  consumer  level  machine  with  i7 
CPU  and  24GB  of  memory.  Figure  10  shows  the  running  time 
and  memory  usage  of  our  algorithm  for  MWM  and  MWVC 
on  large  data-sets.  For  large  graphs,  GraphChi  partitions 
them  to  load  parts  of  them  in  memory,  while  storing  the  rest 
on  disk  by  leveraging  graph-based  ‘local’  structures.  Thus, 
we  were  able  to  solve  problems  with  2.5  billion  edges  on  a 
single  machine.  In  contrast,  Kolmogorov’s  implementation 
[10]  of  the  Blossom  algorithm  cannot  handle  such  large 
graphs  because  distributed  processing  is  difficult  (e.g.,  it 
cannot  handle  more  than  6M  vertices  on  the  same  machine). 
Similarly,  our  algorithm  can  be  run  on  multiple  machines 
to  scale  to  even  larger  problems.  However,  we  leave  this  as 
future  work. 

VI.  Conclusion 

This  paper  explores  the  possibility  of  applying  the  BP 
algorithm  to  solve  generic  combinatorial  optimizations  at 
scale.  We  propose  BP-based  algorithm  that  achieves  high 
approximation  ratio  and  allows  parallel  implementation. 
We  evaluate  the  algorithm’s  effectiveness  and  performance 
by  applying  our  framework  on  three  popular  combinato¬ 
rial  optimization  problems.  Our  evaluation  shows  that  the 
algorithm  outperforms  existing  approximation  algorithms 
across  many  instances  and  is  able  to  solve  large-scale 
problems  with  billions  of  variables.  We  believe  our  BP- 
based  framework  is  of  broader  interest  for  a  wider  class  of 
large-scale  optimization  tasks. 
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